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This invention provides plant arabinogalactan proteins (AGPs) and their genes. AGPs were isolated from Nicotiana alata Nicotiana 
plumbaginafolia and Pyrus communis, Amino acid sequences of isolated AGP peptide molecules are presented. Isolated AGP molecules 
were used to synthesize oligonucleotide probes to prepare oligonucleotide primers for PCR or prepare RNA probes to screen cDNA libraries 
of iV. alata, N. plumbaginafolia, and P. communis, cDNA clones encoding amino acid sequences of isolated AGP molecules were isolated 
The mvention presents for the first time an intact AGP amino acid sequence derived from a corresponding AGP gene. The instant invention 
I'^.^l P.'-ovides methods useful in obtaining AGP genes encoding an AGP peptide comprising a specific isolated hydroxyproline-rich 
(OAST-nch) sequence or a specific isolated hydroxyproline-poor sequence. 
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PLANT ARABINOGALACTAN PROTEIN (AGP) GENES 

Field of the Invention The subject matter of the invention relates to the 
isolation of arabinogalactan proteins (AGPs) from plants, e.g., Nicotiana alata, 
Nicotiana plumbaginafolia and Pyrus communis, and the utiliTation of amino acid 
5 sequences of various AGP fragments for the isolation of corresponding plant genes 
encoding the protein backbone of AGPs. 

Background of the Invention Arabinogalactan proteins (AGPs) are found in 
flowering plants from every taxonomic group tested. These proteoglycans are widely 
distributed in most higher plants, occurring in ahnost all tissues including leaves, 
10 stems, roots, floral parts, seeds, and in many of their secretions. The multi-site 
localization of AGPs appears to be analogous to the multi-site localization of some 
animal proteoglycans. As regards chemical structure, however, little similarity seems 
to exist between plant AGPs and animal proteoglycans. 

The AGPs are a family of structurally related glycosylated molecules 
15 containing high proportions of carbohydrate and usually less than 10 percent by 

weight of protein [Clarke et al. (1978) Aust, J, Plant Physiol 5:707-722; Fincher et 
al. (1983) Ann, Rev. Plant Physiol, 34:47-70], although AGPs having a protein 
content of about 59% are known [Fincher et al. (1983), supra; Anderson et al. (1979) 
Phytochem, 18:609-610]. The carbohydrate consists of polysaccharide chains having 
20 a 1,3-6-D-galactopyranosyl backbone and side chains of (1,3-6- or 1,6-B-)D- 
galactopyranosyl (Galp) residues and often temiinating in B-D-Galp and a-L- 
arabinofuranosyl (Araf) residues (Fincher et al. (1983), supra. Other neutral sugars 
and uronic acids have also been detected, although at low levels. Monosaccharides 
which can be present are L-rhamnopyranose, D-mannopyranose, D-xylopyranose, D- 
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glucopyranose, D-glucuronic acid and its 4-0-methyl derivative and D-galacturonic 
acid and its 4-0-methyI derivative [Fincher et al. (1983), supral . In most cases, 
however, galactose (Gal) and arabinose (Ara) predominate. 

The protein content is usually between two and ten percent [Fincher et al. 
5 (1983), supral . Relatively little is known about the structure and organization of the 
protein core of AGPs, except that the protein appears to be rich in alanine (Ala), 
hydroxyproline (Hyp), serine (Ser), and threonine (Thr) [Fincher et al. (1983), 
supra] . Prior to the present invention, the entire amino acid sequence of an intact 
isolated AGP has not been available publicly. The high carbohydrate content of 

10 AGPs appears to cause difficulties in sequencing; attempts to chemically remove the 
carbohydrate moiety usually result in incomplete deglycosylation and products with 
variable levels of carbohydrate content. The carbohydrate-protein linkage has been 
identified as a fi-galactosyl-hydroxyproline linkage in AGPs isolated from wheat and 
ryegrass [Gleeson et al. (1985) AGP News 5:30-36 and McNamara and Stone (1981) 

15 Lebensm,-Wiss, u-TechnoL 14:182-187]. 

AGPs are components of gum arabic, a gummy exudation originating from the 
Acacia tree and known to be produced by stress conditions such as heat, drought, and 
wounding [Clarke et al. (1979) Phytochemistry 18:520-540], The gum fmds wide use 
as a flavor encapsulator in dry mix products such as puddings, desserts, cake mixes 

20 and soup mixes, and is also used to emulsify essential oils in soft drinks and to 
prevent sugar crystallization in confectionery products [Randall et al. (1989) Food 
Hydrocolloids 3:65-75]. The significance of the protein component to the overall 
structural and functional characteristics of gums has been realized [Vandevelde et al. 
(1985) Carbohydr. Polymers 5:251-273; Connolly et al. (1987) Food Hydrocolloids 

25 1:477-480 and Connolly et al. (1988) Carbohydr. Polymers 8:23-32]. The 

importance of the protein-rich fraction to the emulsification properties of the gum has 
been demonstrated [Randall et al. (1988) Food Hydrocolloids, 2:131-140]. 

The production of natural complex carbohydrates or polysaccharides is 
frequently problematic. For plant exudates, seed or root extracts, production is 

30 dependent on climate and harvest conditions. For example, the production of gum 
arabic in Africa, the main country of origin, can vary each year as a function of 
weather conditions particularly drought, labor supply, natural disasters, political 
conditions, etc.. [Meer et al. (1975) Food Technology 29:22-30.] The unreliable 
supply results in variable gum arabic cost. Agar from seaweed extracts is expensive 
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due to harvesting costs. Further, hand harvesting of agar and seed gums such as guar 
gums can introduce a purity problem. Thus, there is a clear need in a number of 
industries for a reliable, relatively inexpensive gimi or class of gums. 

Cultured plant cell gums containing AGP can be used as a substitute for prior 
5 art gums, such as gum arable, guar gum, xanthan gum, alginic acid, agar, calcium 
alginate, carrageenan, karaya gum, locust bean grnn, potassium or sodium alginate, 
tragacanth gum or others. For example, the isolated or cultured plant cell gums can 
be used as thickening agents, emulsifying agents, adhesives, inks, paints, toothpaste, 
cosmetics, pharmaceuticals, textile printing, sizing and coating, oil-well drilling 

10 muds, concrete, etc. Often, plant cell gums can be used in smaller quantities than 
prior art gums to achieve equivalent functional utility. 

AGPs function in several biological processes including plant development, 
cell-cell adhesion, pollen-stigma recognition, water retention, and disease resistance. 
AGPs may serve as glues or provide nutrients for growing pollen tubes. It has been 

15 suggested [Fincher et al. (1983) supral that AGP proteins may interact with lectins or 
other proteins in the extracellular spaces and may be involved in the cellular response 
to extracellular oligosaccharide signal molecules [Norman et al. (1990) Planta 
181 :365-3731. Since AGPs interact with Yariv antigens and flavonol glycosides 
[Jermyn (1978) 7. Plant Physiol, 5:563-571], they have been thought to have lectin- 

20 like properties. The molecular structure of AGPs has been proposed [Randall et al. 
(1989) Food Hydrocolloids 3:65-75] to resemble a type of block copolyitner wherein 
carbohydrate blocks are covalently linked to a central polypeptide chain, thus 
explaining its ability to sterically stabilize emulsions and dispersions. 

Plant AGP genes are not known in the prior art and the nucleotide sequence of 

25 a plant AGP gene has not been published prior to the present invention. Very 

recently, it was reported [Sheng et al. (1993) Abstract no. 639 in Supplement to Plant 
PhysioL 102, Number 1, May 1993] that a PGR strategy is being used to clone potato 
tuber lectin, extensins and AGP sequences from a potato tuber cDNA library. It was 
reported that PGR products which hybridized to a carrot extensin probe gave several 

30 putative clones which are currently under investigation. No clones corresponding to 
AGP genes were disclosed. 

The process of obtaining an AGP clone has been found to be complex and 
problematic. Three of the problems associated with AGPs and their genes are (1) the 
difficulties of identifying a single AGP species as they are often present as members 
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of closely related molecular species; (2) the very high redundancy associated with 
the characteristic amino acid sequence of an AGP peptide, i.e., (a) a high 
hydro xyproline content and (b) regions containing a high content of hydroxyproline, 
alanine, serine, and threonine (OAST); and (3) the GC-richness of corresponding 

5 oligonucleotides leading to problems with the specificity of hybridization. Indistinct 
and imprecise alignment during nucleic acid hybridization, for example, in the PGR 
technique, has resulted in lack of success in the ability to obtain an AGP clone. This 
results in the amplification of incorrect sequences when compared to the original 
template. Plants are also known to contain a variety of glycine-rich proteins which 

10 are also encoded by GC-rich DNA. Applicants' disclosure circumvents this problem 
and enables the isolation of AGP genes. 

Two approaches to the isolation of the AGPs from plant extracts have been 
used in previous studies. One approach consists of classical fractionation of plant 
extracts [Fincher et al. (1974) Aust. J. Biol, Sci. 27:117-132; Aspinall (1969) Adv, 

15 Carbohydrate Chem, 24:333-379]. Another approach to the isolation of AGPs from 
plant extracts is precipitation with a class of dyes prepared by coupling diazotized 4- 
aminophenyl glycosides to phloroglucinol [Jermyn et al. (1975), supraT . These dyes 
were first prepared by Yariv et al. [(1962) Biochem. J. 85:383-388)] as precipitating 
antigens for antibodies to glycoside determinants, and the B-glycosyl artificial 

20 carbohydrate antigen was shown to precipitate an arabinose-and-galactose-containing 
polymer from soya bean, jack bean and maize [Yariv et al. (1967) Biochem, 7. 
I05 :lc-2c1. Since then, this precipitation reaction has been widely used to isolate 
AGPs. 

These dyes have also been used as cytochemical reagents for the localization 
25 of AGPs in plant tissues [Clarke et al. (1975), J. Cell ScU 19:157-167; Clarke et al. 
(1978), Q. Rev, BioL 53:3-28]. The nature of the binding of AGP to the Yariv 
reagent is not understood, but it is likely to involve both carbohydrate and protein 
residues. The binding of Yariv 's reagent to AGP is not affected by removal of the 
arabinose residues [Gleeson et al. (1979), supra : Akiyama et al. (1981), supra], but is 
30 abolished by progressive acid hydrolysis of the AGP [Fincher et al. (1983), supra]. 
In higher plants AGPs are classified as belonging to a group of proteins 
characterized by their high hydroxyproline content. These hydroxyproline-rich 
glycoproteins (HRGPs) are characterized by carbohydrate side chains that contain 
arabinose and galactose. The group has been traditionally divided into three main 
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classes: the cell wall associated extensins; the soluble arabinogalactan-proteins 
(AGPs), and the solanaceous lectins. The differences between these groups are 
summarized in Table 1.0. The most important factors in the classification of the 
HRGPs are: the amount, composition, and sequence of their carbohydrate 
component, the sequence and composition of the polypeptide backbone, the linkage 
between carbohydrate and protein and its localization. 

A new group of proteins, the proline-rich proteins (PRPs). has been described 
recently. The PRPs have also been referred to as the hydroxyproline/proline-rich 
proteins or the repetitive proline-rich proteins. Amino acid compositions of some 
PRPs [Averyhart-Fullhard et al. (1988) Proc. Natl. Acad. 85:1082-1085; Datta et al. 
(1989) Plant Cell 1:945-952; Kleis-San Francisco et al. (1990) Plant Physiol. 
94:1897-1902] indicated equimolar amounts of proline and hydroxyproline. 
However, the PRPs do not appear to be glycosylated and, in this way, are 
distinguished from the HRGPs (hydroxyproline-rich glycoproteins). 

SummarY of the Invention The present invention provides for the first time 
DNA fragments encoding protein backbones of plant arabinogalactan proteins 
(nonglycosylated AGPs). Specific embodiments of the invention present cDNA 
clones encoding nonglycosylated AGPs from cell suspension cultures of Nicotiana 
alata (NaAGPl), Nicotiana plumbaginafolia (NpAGPl), and Pyrus communis 
(PCAGP23, PCAGP9, and PcAGP2) and fi-om Nicotiana alata styles (Na35_l and 
AGPNal 1). Full length and partial nucleotide sequences of the cDNAs encoding 
said nonglycosylated AGPs are disclosed. DNA recombinant vectors containing these 
cDNAs are also provided. In further embodiments of the invention, genomic DNAs 
encoding plant nonglycosylated AGPs and recombinant vectors containing said 
genomic DNAs are provided. This invention further contemplates the use of 
oligonucleotide probes based on the amino acid sequences of plant AGPs for the 
detection of hybridizing sequences and the isolation of plant AGP genes. 

The invention also provides isolated plant AGP peptides and amino acid 
sequences of AGP peptide fragments. AGP peptides were isolated from Nicotiana 
alata, Nicotiana plumbaginafolia, and Pyrus communis. The amino acid sequences 
obtained from isolated AGP peptide fragments were either enriched in hydroxyproline 
or not enriched in hydroxyproline. In panicular, hydroxyproline-enriched sequences 
were characterized by having (i) a high content of hydroxyproline and/or (ii) a high 
content of hydroxyproline, alanine, serine, and threonine (O AST-enriched). The 
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sequences that were immediately useful in obtaining an AGP gene were those 
sequences that were (i) not enriched in hydroxyproline, and/or (ii) not enriched in 
hydroxyproline, alanine, serine, and threonine content (not O AST-enriched). Prior to 
the present invention, the amino acid sequence of an intact plant AGP has not been 
5 publicly available. cDNAs thought to encode AGPs have been described, but 
evidence of a match between these sequences and amino acid sequence data from 
isolated AGPs is missing in these cases. 
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Table 1.0 



Biochemical and structural features of hydroxyproline-rich glycoproteins (HRGPs) 



Property 


HRGPs 


Ext ens ins 


Arabinogalac tan- 
pro tains (AGPs) 


Solanaceous 
Lectins 


% Protein (w/w) 


40-50 


2-10 


50-60 


Galactose/ 


<1 


>1 


<1 


Galactose 
LinJca^e Types 


terminal 


1,3 -linked 
1,3,6- 1 inked 
1, 6- linked 
terminal 


terminal 


Arabinose 
Linkage Types 


1, 2 -linked 
X , J - i xnKeci 
terminal 


terminal 


1. 2- linked 

1 . 3 - linked 
terminal 


Glycopeptide 
linkages 


O- linked: 
/vra-xiyp oc vaax- 
Ser 


O- linked: 
Gal -Hyp 


O- linked: 
Ara-Hyp & Gal- 
Ser 


Abundant Amino 
Acids 


Hyp , Lys , Tyr , 
Ser Sc Pro 


Hyp, Ala & Ser 


HvD . Cvs . Glv & 
Ser 


mo 1 % H VT) ( o f 
protein domains) 


>30 


>15 


>13 


Amino Acid 
Repeats 


Ser (Hyp), 


? 




Isolectric Point 


9 . 5-11 


2-5 


9,5 


Localization 


Cell wall 


Extracellular 
matrix; plasma 
membrane 


Cytoplasm & 
vacuole 


fi-glucosyl Yariv 
reagent binding 


No 


Yes 


No 



The invention further provides a substantially pure AGP having an amino acid 
sequence which is essentially that derived from a nucleotide sequence of an AGP 
gene. Specific embodiments of the invention provide an AGP comprising an amino 
acid sequence consisting essentially of that derived from the nucleotide sequence of an 
5 AGP gene from Nicotiana alata, Nicotiana plumbaginafolia, or Pyrus communis. 

It is also an object of the invention to provide a method for obtaining a plant 
AGP gene. This method comprises the step of obtaining from an AGP peptide a 
fragment having an amino acid sequence that is hydroxyproline-poor, e.g., not 
enriched in OAST content. This hydroxyproline-poor sequence is then used to design 
10 a nucleotide primer which can be used to obtain, for example, a PGR fragment useful 
in screening a plant gene library for a hybridizing clone. This approach is contrary 
to that generally used. Usually, a sequence which particularly characterizes an AGP 
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(i.e., a sequence that is hydroxyproline-rich or enriched in OAST content) is utilized 
to design an oligonucleotide primer for use in attempts to obtain a hybridizing clone. 
In Applicants' approach, a hydroxyproline-rich peptide sequence which particularly 
characterizes an AGP protein is not utilized, and is avoided; instead, a sequence 

5 which does not comprise a characterizing sequence of an AGP (i.e., a 

hydroxyproline-poor sequence) is utilized for the isolation of an AGP gene. In 
specific embodiments of the invention, amino acid sequences which were not enriched 
in hydroxyproline or OAST content were isolated from peptides from AGPs isolated 
from culture filtrates of suspension cultured cells or from style extracts of N, alata, 

10 N, plumbaginafolia, and P. communis. These sequences enabled the isolation of 
corresponding cDNA clones. 

The present invention also provides a method for obtaining an AGP gene by 
utilizing a hydroxyproline-rich AGP sequence. Prior to the instant disclosure, public 
knowledge of hydroxyproline-rich AGP fragments has not enabled the isolation of 

15 corresponding AGP genes, due to difficulties imposed by resultant GC-rich domains. 
A method is provided herein that enables the use of a specific hydroxyproline-rich 
AGP peptide sequence for the isolation of a corresponding gene. The approach for 
using a hydroxyproline-rich sequence comprises the use of long guessmers combined 
with single- stranded antisense RNA probes for the screening of a library. The use of 

20 a long guessmer together with an RNA probe overcomes the problems presented upon 
using short oligonucleotide probes. A long guessmer can more easily accommodate 
mismatches and the use of an antisense RNA probe allows "U" to be used at the third 
position of the anticodon for AST amino acids, thus increasing the likelihood of the 
guessmer hybridizing to the target sequence. The resultant RNA molecule can be 

25 heavily labeled, permitting greater levels of detection, and also can bind more 

strongly to its target sequences than a DNA probe. 

The invention also provides specific AGP cDNA sequences and specific 

oligonucleotide probe sequences for screening cDNA libraries to isolate specific plant 

AGP genes. For example, in specific embodiments, the following cDNA clones are 

30 provided: 

Source cDNA clone 

A^. alata cell suspension culture NaAGPl (SEQ ID NO:24) 

N, plumbaginafolia cell suspension culture NpAGPl (SEQ ID NO:25) 

P. communis cell suspension culture PcAGP23 (SEQ ID NO:49) 

35 P. communis cell suspension culnire PcAGP9 (SEQ ID NO: 66) 
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PcAGP2 (SEQ ID NO:91) 
Na35_l (SEQ ID NO:63) 
AGPNal 1 (SEQ ID NO: 72) 



The names set out below have been allocated for the genes corresponding to 
5 clones of these embodiments, as follows: 



cDNA clone 

Na AGPl 
Np AGPl 
Pc AGP9 

10 Pc AGP2 

AGP Nal 1 



Gene 

AGP Nal 
AGP Np2 
AGP Pel 
AGP Pc2 
AGP A^al 



The invention further provides antisense RNA probes designed such that they 
comprise one or more nucleotide sequences encoding amino acid sequences that are 
OAST-rich, representing the same or different AGPs. Also provided are RNA probes 

15 comprising a nucleotide sequence encoding an OAST-rich consensus sequence for 
plant AGPs. A guessmer-antisense RNA probe approach may also be used with an 
O AST-poor AGP sequence to isolate a corresponding AGP gene. 

It is also an object of the present invention to provide an antibody to a 
substantially pure plant AGP, or fragment thereof, comprising an amino acid 

20 sequence consisting essentially of a whole or partial amino acid sequence derived 
from a plant AGP gene. Also provided is an antibody to an isolated AGP peptide 
fragment that is not enriched in hydroxyproline. Also provided by the invention is 
an antibody to a synthetic AGP peptide, or fragment thereof. 

This invention further contemplates the use of antibodies to substantially pure 

25 AGP peptides. AGP peptide fragments not enriched in hydroxyproline or OAST 
content, or synthetic AGP peptides for (a) the detection, isolation, or diagnosis of 
AGPs in AGP-containing mixtures or tissues, and (b) in reducing or inhibiting natural 
biological and chemical AGP activities. In this regard, that polyclonal and 
monoclonal antibodies to AGPs or AGP peptides will most effective in detection, 

30 isolation or diagnosis of AGPs in AGP-containing mixtures or tissues that are 
deglycosylated or otherwise preconditioned to expose the protein backbone of the 
AGP. 
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This invention also provides a genetically-engineered DNA molecule 
comprising a plant AGP gene under control of a heterologous promoter such that a 
nonglycosylated AGP is expressed. In a specific embodiment of the invention; an 
AGP gene obtained from N, alata. , N, plumbaginafolia, or P. communis is inserted 
5 behind a heterologous promoter (e.g. a bacterial, viral, plant, etc., promoter) in a 
host cell such that a nonglycosylated AGP is expressed. 

It is also an object of the invention to provide a genetically-engineered DNA 
molecule comprising a plant AGP gene under control of a heterologous promoter such 
that a glycosylated AGP is expressed. For example, this invention contemplates the 
10 utilization of the expressed nonglycosylated AGP as a substrate for glycosylating and 
carbohydrate-protein linking enzymes (e.g., prolyl hydroxylase, glycosyl transferase, 
etc.), to produce a glycosylated AGP. It is also an object of the invention to provide 
a host cell (for example, monocots, dicots, etc.) transformed with genetically- 
engineered DNA comprising a plant AGP gene under control of a heterologous 
15 promoter such that a glycosylated AGP is expressed. It is a further object of the 
invention to provide a plant AGP gene-transformed host cell capable of over- 
producing or under-producing nonglycosylated AGP. It is an additional object of the 
invention to provide an AGP gene-transformed host cell capable of further metabolic 
processing of an expressed nonglycosylated AGP, 
20 This invention further provides a genetically-engineered DNA molecule 

comprising a plant AGP promoter. In specific embodiments of the invention, AGP 
promoters are isolated from N. alata, N, plumbaginafolia, and P. communis. 
Subsequently, a recombinant DNA molecule is genetically engineered to comprise a 
plant AGP promoter situated adjacent to a heterologous structural gene such that the 
25 structural gene is expressed under the control of the plant AGP promoter. Also, the 
coding region of the gene could be used behind tissue-specific promoters to express 
the AGP at particular sites in a whole plant. This could change the phenotypes with 
respect to such functions as pest resistance, for example. 

The instant invention provides a source of AGP that is not dependent upon its 
30 isolation from plant exudates, e.g., gum arable, guar gum, etc. The availability of 
natural sources of AGP-containing gums, e.g., from trees, roots, seeds, seaweed, 
etc., present problems associated with harvesting, climate, man-power, fermentation, 
isolation, purity, and high costs. The production of AGPs using recombinant gene 
technology ensures (a) a method of supplying AGP that is independent of harvesting 
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problems, (b) that enables high levels of quality control, (c) that provides a supply of 
substantially pure AGP product, (d) that permits an overproduction of AGP in a host 
cell, and (e) that can be adapted to produce a specifically engineered AGP having 
desired properties. Thus, this invention provides a means for supplying the functions 

5 and utilities of plant gums, e.g., gum arabic, etc., without the need for finding 
renewable but shrinking natural sources of plant gums. These functions find wide 
applications as thickening, gelling, emulsifying, dispersing, suspending, stabilizing, 
encapsulating, flocculating, film-forming, sizing, adhesive, binding and/or coating 
agents, and/or as lubricants, water-retention agents, and coagulants. 

10 Brief Description of Drawings Figures lA and IB present different 

strategies for the preparation of single-stranded antisense RNA probes from 
oligonucleotides. Fig. lA Single oligonucleotide probes: (Fig. Al-1) Two 
complementary guessmers annealed to each other to form a double-stranded construct 
containing the T7 promoter. (Fig. lA-2) A short primer annealed to form double- 

15 stranded T7 promoter sequence. Fig. IB Double oligonucleotide probes: (Fig. IB- 
1) Two guessmers annealed to each other through the complementary adaptor 
sequences at their 3 "-ends. (Fig. lB-2) Two guessmers annealed to a mediator 
DNA through their adaptor sequences 

I 1 

20 ' ' : adaptor sequence. Other promoters, for example, T3 

or Sp6 RNA polymerases, may also be used. 

Figure IC presents a Coomassie blue stained SDS-PAGE gel blot of 

deglycosylated and non-deglycosylated AGPs from various sources. AGPs were 

isolated from suspension culture filtrates of N. alata; N, plumbaginafolia and pear 

25 (Pyrus communis) by Yariv precipitation and deglycosylated with 

trifluoromethanesulfonic acid (TFMS). The deglycosylated and non-deglycosylated 
AGPs were separated on a 17.5% SDS-PAGE gel and blotted onto a PVDF 
membrane. After staining with Coomassie blue, the major band (MW 20-30 kD, 
indicated by an arrow) from deglycosylated N. alata AGPs was excised and 

30 sequenced. 

Figure ID presents a PGR strategy for cloning of the NaAGPl gene 
corresponding to an amino acid sequence of a deglycosylated AGP backbone from N, 
alata cell suspension culture. The sequences of the NaRl, NaFl, and NaF2 primers 
used to isolate the clone for NaAGPl are given in Table 1.1 and Figure IE. 
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Figure IE presents a nucleotide and the derived amino acid sequences of the 
160-bp primer extension fragment. The derived amino acid sequence corresponding 
to the peptide sequence by protein microsequencing is underlined. The asterisks (*) 
indicate the amino acids of the peptide obtained by direct microsequencing which are 
5 identical with the derived sequence. The sequences of the two oligonucleotides 

(NaFl, NaF2) designed for the amplification of the 3 '-fragment of the AGP gene are 
double-underlined. The nucleotide sequence corresponding to the primers (NaRl) is 
underlined. 

Figure IF presents the nucleotide and predicted amino acid sequences die 
10 NaAGPl cDNA from N, alata cell suspension culture (NaAGPl). The nucleotide 
sequence obtained by PGR, which does not overlap with the cDNA clone, is in 
italics. The derived amino acid sequence corresponding to the peptide sequence by 
protein microsequencing is underlined. The asterisks (*) indicate the amino acids of 
the peptide obtained by direct microsequencing which are identical with the derived 
15 sequence. A predicted signal sequence is dot-underlined. X = undetermined 
residue. 

Figure IG presents a sunmiary of key structural features of the derived amino 
acid sequence of the NaAGPl cDNA. The hydropathy values of each amino acid 
have been determined using an interval of nine amino acids according to the weight 
20 system of Kyte and Doolittle (1982). Values above the dotted line indicate 

hydrophobic regions, and the values below the dotted line represent hydrophilic 
regions. 

Figure IH presents the nucleotide and predicted amino acid sequences of an 
A^. plumbaginafolia AGP derived from cell suspension culture (NpAGPl). The 
25 derived amino acid sequences corresponding to the peptide sequence by protein 
microsequencing is imderlined. The asterisks (*) indicate the amino acids of the 
peptide obtained by direct microsequencing which are identical with the derived 
sequence. 

O: hydroxyproline. 

30 Figure II presents the alignment of the derived amino acid sequences of the 

NaAGPl and NpAGPl cDNAs. The derived amino acid sequence of NaAGPl 
cDNA is shown in the upper line and that of the NpAGPl shown in the lower line. 
Identical aligned residues are indicated with ' | ' . Gaps were introduced when required 
to maximize the alignment. 
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Figure IJ presents the alignment of the NaAGPl and the NpAGPl cDNA 
sequences. The nucleotide sequence of the NaAGPl cDNA is shown in the upper 
line and that of the NpAGPl shown in the lower line. Identical aligned residues are 
indicated with ' | ' . Gaps were introduced when required to maximize the alignment. 
5 Figure IK presents northern blot analyses of the NaAGPl and NpAGPl 

genes. 

Fig. lK-1: Total RNA was isolated from N, alata (1) leaves, (2) pollen, (3) 
styles, (4) stems, (5) petals, (6) roots and (7) suspension-cultured cells. Equal 
amounts (10 /xg/lane) of RNA were fractionated on formaldehyde agarose gels, 
10 transferred to Hybond-N membranes, and hybridized with ^^P-labeled 5 '-probe (1-540 
bp) and 3'-probe (541-1700 bp) of the NaAGPl cDNA respectively. 

Fig. lK-2: Total RNA (10 /xg/lane) isolated from suspension- cultured cells 
of A^. alata and N, plumbaginafolia was blotted and hybridized with the NaAGPl 
cDNA. 

15 The size of RNA transcripts is indicated at the right. 

Figures 2A-2D present a flow chart describing the isolation and sequencing of 
AGP peptides from cell suspension culmre filtrates of Nicotiana plumbaginafolia. 

Figm-es 3A-3F present a flow chart describing the isolation and sequencing of 
AGP peptides from cell suspension culture filtrates of Pyrus communis, 
20 Figure 3G presents the nucleotide and derived amino acid sequences of the 

350-bp PGR fragment. The derived amino acid sequence matching the peptide 
sequence by protein sequencing is underlined. The nucleotide sequence 
corresponding to the PcA23F2a primers is double-underlined. 

Figure 3H presents the nucleotide and predicted amino acid sequences of 
25 PcAGP23 cDNA clone encoding an AGP backbone from pear cell suspension culture. 
The translational initiation and stop sites are in bold-face. The predicted secretion 
signal is underlined with dots. The two potential N-glycosylation sites are double- 
underlined. The sequence matching the peptide sequences obtained from the AGP 
protein backbone are underlined. The proline residues which are hydroxylated, as 
30 identified by protein sequencing, are indicated by an "O" imdemeath. 

Figures 4A-4C present a flow chart describing the isolation and sequencing of 
AGP peptides from style extract of Nicotiana alata. 

Figure 4D presents the cloning strategy of the Na35_l gene. 

Figure 4E presents the nucleotide sequence of the PGR fragment by using 
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RT35-specific primer and the predicted amino acid sequences. The derived amino 
acid sequence corresponding to the peptide sequence by protein microsequencing is 
underlined. The RT35 specific primer sequence is double underlined. 

Figure 4F presents the nucleotide sequence of NA35_1 cDNA clone and the 
5 predicted amino acid sequences. The derived amino acid sequence corresponding to 
the peptide sequence by protein microsequencing is underlined. 

Figure 4G presents northern blot analyses of the NA35_1 gene expression in 
various parts of N. alata. Total RNAs from N. alata styles (S2S2. S3S3, S5S6; 10 /xg 
each), leaves (S^Sg, 10 ^g), stems (S^S^, 10 iig) and roots (SgSg, 6.3 ^g) were 
10 fractionated on a formaldehyde agarose gel, transferred to a nylon membrane, and 
hybridized with ^^P labeled NA35_1 probe. The size of the RNA transcripts is 
indicated in kilo nucleotides. 

Figure 4H presents northern blot analyses of the NA35_1 gene expression in 
various suspension-cultured cells and plants. Total RNAs (10 ^g/lane) isolated from 
15 suspension cultured cells of N, alata and N. plumbaginafolia, Pyrus, and styles of A^. 
alata (SgSg) and L. peruvianum were blotted and hybridized with the NA35_1 probe. 
The size of the RNA transcripts is indicated in kilo nucleotides. 

Figure 41 presents reversed phase HPLC (RP-HPLC) separation of 
thermolysm cleavage products of the RT25 protein backbone. RT25 protein 
20 backbone (5-10 /xg) was digested with thermolysin and loaded onto an RP-300 column 
(2.1 X 100 mm, C8, ABT) equilibrated in 0.1% TFA at 1 ml/min. Unbound material 
was collected and bound material eluted with a linear gradient (0-60% acetonitrile in 
0.1 % TFA; 60 min; 100 yMmm), Peptides (Pl-6) eluted from the colimin were 
monitored at ^2x5^- Thermolysin was eluted after retention time 40 min. Individual 
25 peptides were subjected to amino acid sequencing. 

Figure 4J presents reversed phase HPLC separation of endoproteinase Asp-N 
cleavage products of the RT25 protein backbone. RT25 protein backbone was 
digested with endoproteinase Asp-N. The resulting peptides were loaded onto an RP- 
300 column (2.1x 100 mm, C8, ABI) equilibrated in 0.1% TFA at 1 ml/min. 
30 Unbound material was collected and bound material eluted with a linear gradient (0- 
60% acetonitrile in 0,1% TFA; 60 min; 100 /xl/min). Peptides eluted from the 
column were monitored at Ajisnm- Peptides, Al and A2 were subject to amino acid 
sequencing. Undigested starting material (RT25) was also detected. 

Figure 4K presents nucleotide and deduced amino acid sequences (SEQ ID 
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NO: 72) of the AGPNal 1 cDNA clone. The putative secretion signal (dot underlined) 
was predicted by using the PSIGNAL program (PC/Gene software, IntelliGenetics) 
based on the method described by Von Heijne (1986) NucL Acids Res, 14:4683-4690. 
Internal peptide sequences from amino acid sequencing are indicated by solid 
underlines and Hyp is shown encircled. Dash (-) indicates the stop codon. 

Figure 4L presents a hydropathy plot of the deduced amino acid sequence 
from the AGPNal 1 cDNA clone. The hydrophobicity of the deduced amino acid 
sequence was calculated by the SOAP program (PC/Gene software, IntelliGenetics) 
based on the method developed by Kyte and Doolittle (1982) J. Mol Biol. 157:105- 
132. The putative secretion signal (shadowed) was predicted by using the PSIGNAL 
program (PC/Gene software, IntelliGenetics) based on the method described by Von 
Heijne (1986) supra . 

Figure 4M presents an RNA blot analysis of expression of the AGPNal 1 gene 
in N. alata and other plants. Total RNA (10 /xg/lane) isolated from (Fig.4M-l) 
tissues of N. alata (genotype S^S^): style, ovary, petal, anther, stem, leaf and root; 
and (Fig.4M-2) styles of N. alata, N. sylvestris, N, tabacum, N. glauca, L, 
peruvianum and leaves of Arabidopsis and rye grass were run in a 2% agarose gel 
(15% formaldehyde; 40 mM MOPS buffer, pH 7.0) and blotted onto a Hybond-N 
nylon membrane (Amersham). AGPNal 1 cDNA fragment was labeled to 10* 
cpm//ig with ^^P-dCTP. Hybridization was performed at 60'^C overnight in 0.22 M 
NaCl, 15 mM NaH2P04, 1.5 mM EDTA, 1% SDS, 1% BLOTTO and 4 mg/ml 
herring sperm DNA. The membrane was washed for 2x 10 min., at room 
temperature, in 2x SSC, 1% SDS; 2x 10 min., 60°C, in 0.2x SSC, 1% SDS. 

Figure 4N presents an SDS-PAGE analysis of AT. alata style AGPs at various 
stages of purification. SDS-PAGE (10% gel) followed by (Fig.4N-l) silver staining 
and (Fig.4N-2) staining with 6-glucosyl Yariv reagent. Lane 1, total style extract (1 
lig AGP). Lane 2, 95% (NH4)2S04-supematant (4 /ig AGP). Lane 3, Mono Q-bound 
AGP-containing fraction (4 /ig AGP). Lane 4, Superose 6 AGP-containing fraction 
(4 ^g AGP). Lane 5, as Lane 3, but containing 20 ^g AGP. Lane 6, as Lane 4, but 
containing 20 ii% AGP. Protein molecular weight markers (M) are shown on the left. 

Figure 40 presents crossed-electrophoresis of AGPs from styles of N. alata 
during fractionation. AGPs from (Fig.40-1) crude style extract, (Fig.40-2) 95% 
(NH4)2S04-supematant, (Fig.40-3) Mono Q-unbound AGP-containing fraction, and 
(Fig. 40-4) Mono Q-bound fraction were first electrophoresed in a 1% agarose gel 
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horizonatally then vertically into a gel containing the 6-glucosyl Yariv reagent. 

Figure 5A presents the nucleotide and predicted amino acid sequences of 
PcAGP9 encoding the protein backbone of an AGP from Pyrus communis cell 
suspension culture. The putative secretion signal peptide is underlined with dots. 
5 The sequences which match the peptide sequences obtained by protein sequencing are 
underlined. The proline residues which are modified post-translationally to 
hydroxyprolines are indicated by "O" imdemeath. X: undetermined residue. 

Figure 5B presents northern blot analyses of the PcAGP9 gene. (Fig.5B-l) 
Total RNA was isolated from pedicels (1) and cultured cells (2) of Pyrus communis; 

10 cultured cells of Nicotiana plumbaginafolia (3), shoots of Brassica napus (4), 
Arabidopsis thaliana (5) and Lycopersicon esculentum (6) and leaves of Lolium 
temulentum (7). Equal amounts (10 /xg/lane) of RNA were fractionated on 
formaldehyde agarose gels, transferred to Hybond-N membranes, and hybridized with 
^^P-labeled PcAGP9 cDNA at 55 ^C. The final wash was carried out at 55 °C for 30 

15 min with IxSSC +0.1% SDS. (Fig.5B-2) The same RNA blot was hybridized and 
washed at higher stringency (65 °C). The size of the PcAGP9 RNA transcript in 
Pyrus communis cultured cells is indicated at the left. 

Figure 5C presents a hydropathy plot of the deduced amino acid sequence of 
PcAGP9 (SEQ ID NO: 66). The hydropathy values of each ammo acid have been 

20 determined by using an interval of five to fifteen amino acids according to Kyte and 
Doolittle (1982) supra . Values above the dotted line indicate hydrophobic regions 
and values below the dotted line represent hydrophilic regions. 

Figure 5D presents a flow chart of the separation of AGPs from Pyrus 
communis (pear) cell suspension culture and the isolation of their protein backbones. 

25 A. RP-HPLC (RP-300 column, 4.6 x 100 nmi) profile of AGPs prepared by 

precipitation with the fi-glucosyl Yariv reagent. AGPs were loaded and the column 
washed with solvent A (0.1% TFA in HjO). The unbound fraction was collected (not 
shown). The bound material was eluted wifli a linear gradient (0-100% solvent B; 
flow rate 1 ml/min; 60 min) (solvent B: 60% acetonitrile in solvent A). Individual 

30 fractions from five separate runs were pooled for subsequent purification. 

B. RP-HPLC (RP-300 column, 4.6 x 100 mm) profile of AGPs fi-om the 
major bound peak shown in A (retention time 5.0-10.57 min). Bound material was 
eluted with a shallow gradient (0-15% solvent B; flow rate 1 ml/min; 60 min). Two 
fractions (1 and 2) were separately collected and subjected to size-exclusion FPLC. 
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C. Superose-6 FPLC profiles of AGPs in the unbound fraction from A and 
two eluted fractions from B. Samples were eluted in 25% acetonitrile, 0.2 M KCl, 5 
mM KH2PO4 (flow rate 0.4 ml/min). The unbound fraction and Fraction 1 gave 
single peaks; Fraction 2 resolved into two peaks (Peak 2A and 2B). 
5 D. Superdex-75 FPLC profiles of protein backbones derived from AGPs in C 

by HF deglycosylation. Samples were eluted in the same buffer used in C (flow rate 
of 0.8 ml/min). The size of the protein was estimated from standard protein markers 
(Pharmacia). 

The X axis is retention time (min). The pathway for purification of the AGP 
10 fractions, from which peptide sequences were obtained, is stippled. 

Figure 5E presents the nucleotide and predicted amino acid sequence of 
PcAGP2 cDNA (SEQ ID NO: 91) encoding a putative AGP backbone from suspension 
cultured cells of P, communis. The translational initiation and stop sites are in bold- 
face. The predicted secretion signal is underlined with dots. The two long direct 
15 repeats are double-underlined. The sequence matching the peptide sequences obtained 
from the AGP protein backbone are underlined. The proline residues modified to 
Hyp are indicated by an "O." 

Detailed Description of the Invention The following definitions are provided 
in order to provide clarity as to the intent or scope of their usage in the specification 
20 and claims. 

The term arabinogalactan protein or AGP as used herein refers to a Yariv 
reagent-precipitable, glycosylated molecule in which the protein constituent typically 
accounts for approximately 2 to 10% of the molecular weight of the molecule 
[although AGPs having protein values outside this range are known (Anderson et al. 

25 (1979) supra) ] and in which carbohydrate usually accoimts for most of the weight of 
the molecule. Galactose and arabinose form the major carbohydrate constituents with 
other monosaccharides and uronic acids as minor components; the galactosyl residues 
are organized to form a backbone of 3-linked galactose with branches through C(0)6; 
the arabinosyl residues are predominantly in terminal positions. AGPs specifically 

30 bind to and are precipitated by B-glycosyl- Yariv reagents as a red colored complex. 
AGPs usually comprise a domain(s) enriched in hydroxyproline, alanine, serine, and 
threonine. 

The term Yariv reagent-precipitable as used herein refers to an AGP that is 
capable of being precipitated by B-glucosyl- Yariv reagents. 
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The term native AGP as used herein refers to an AGP in its native state, i.e., 
glycosylated. 

The term glycosylated AGP as used herein refers to an AGP molecule 
comprising the carbohydrate components linked to the protein skeleton or backbone. 
5 The term deglvcosvlated AGP as used herein refers to a native AGP or a 

glycosylated AGP which has been subjected to treatment for removal of carbohydrate ^ 
and as a result of which contains a decreased but variable carbohydrate content. 

The term nonglvcosvlated AGP or AGP backbone as used herein refers to a 
protein skeleton or backbone of an AGP molecule which is not glycosylated. 
10 The term synthetic AGP as used herein refers to an AGP molecule which is 

chemically synthesized. 

The term synthetic nonglvcosvlated AGP as used herein refers to a peptide 
backbone of an AGP which is chemically synthesized. 

The term enriched in hvdroxvproline or hvdroxvproline-enriched or 
15 hvdroxyproline-rich as used herein refers to a region or domain or segment of an 
amino acid sequence that has a hydroxyproline content of greater than 15%, and 
usually about 50% or greater. 

The term O AST-enriched or high content of hvdroxvproline , alanine, serine, 
and threonine or enriched in OAST content as used herein refers to a region of an 
20 amino acid sequence wherein the sum of the hydroxyprolyl, alanyl, seryl, and 

threonyl residues constitutes at least about 35%, and preferably at least about 60%, of 
the total amino acid residues. 

The term hvdroxvproline-poor or not enriched in hydroxyproline as used 
herein refers to a region or domain of a peptide sequence that has a hydroxyproline 
25 content that is preferably less than 15%, more preferably less than 10% and most 
preferably less than 5 % . A hvdroxvproline-poor region may also have an OAST 
content that is preferably less than 50%, more preferably less than 35% and most 
preferably less than 20%. 

The term a characterizing sequence or a sequence characterizing an AGP as 
30 used herein refers to a sequence that is hydroxyproline-rich and/or sequences that are 
enriched in OAST content. 

The term a guessmer as used herein refers to an oligonucleotide that contains 
only a subset of the possible codons at each position. Guessmer is a term used 
routinely in the art and is thoroughly elucidated in Molecular Cloning, A Laboratory 
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Manual, J. Sambrook, E.F. Fritsh and T. Maniatis, 2nd edition. Cold Spring Harbor 
Laboratory Press, 1989, pp. 11.11-11,16. In many cases, a guessmer is a 
chemically-synthesized, single oligonucleotide, 30-70 nucleotides in length, that 
contains the combination of codons most likely to match the authentic gene. 
5 The term antisense RNA probe as used herein refers to a RNA strand 

produced from a DNA template encoding a desired amino acid sequence. The 
nucleotide sequence of the RNA is complementary to the coding strand of the DNA 
template sequence. 

The term substantially pure as used herein refers to a protein that is 
10 substantially free of other proteins with which it is associated in nature. 

The isolation of AGP genes from N, alata, N. plumbaginafolia, and P. 
communis suspension cultures and N. alata styles, as illustrated herein, exemplifies 
the present invention which embraces the utilization of an amino acid sequence of a 
region of an AGP peptide from a plant cell to isolate a corresponding plant AGP 
15 gene. Not all regions or domains of AGP peptide sequences can be used equivalently 
to produce viable oligonucleotide primers for the isolation of AGP genes. AGP genes 
have been successfully isolated by using two different strategies: 

(A) the use of a non-hydroxyproline-rich sequence as a primer template to 
obtain a corresponding AGP gene, and 
20 (B) the use of a guessmer-antisense RNA probe approach wherein the guessmer 

can comprise a nucleotide sequence encoding a hydroxyproline-rich 
segment to obtain an AGP gene encoding the sequence of the 
hydroxyproiine-rich segment. 



In strategy A, the preferred sequences are those that have a low content, or are 
25 deficient in, hydroxyproline. Hydroxyproline-poor sequences are found in terminal 
regions as well as in internal domains of AGP peptides. It is also preferable that 
sequences of AGP peptides or fragments thereof, selected for synthesis of synthetic 
oligonucleotide primers, have a low hydroxyproline, alanine, serine, and threonine 
(OAST) content. It is particularly preferable that the content of the sum of these four 
30 amino acid residues be less than 50%, and more preferably less than 35%, and most 
preferably less than 20%, of the total amino acid residues. AGP sequences that are 
useful in isolating an AGP gene using PGR technology, i.e., sequences that are 
hydroxyproline-poor, or OAST-poor, are not available in the prior art. 
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The amino acid sequence selected as a template for the synthesis of an 
oligonucleotide primer should not be one that gives PGR degenerate primers having 
concentrated "GC-rich" regions. Primers having concentrated "GC-rich" sequences 
frustrate and make futile the attempts to obtain cDNA by the PGR technique. For 
5 example, AGP peptide fragments published in the art are the following: 

from carrot (Jennyn, 1985, supra) * 

(1) A-D/N-A-0-A-O.S-0-A/T-0/S-(0) (SEQIDNO:!) 

(2) D-E-A-O-A-O-A-O-S-O-M- (SEQ ID NO:2) 

(3) G/E-0-A-0-A-0-A-0-(Q)-(V)- (SEQ ID NO:3) 

10 from ryegrass, Lolium multiflorum (Gleeson et al., 1989, supra) * 

(1) A-E-A-O-A-O-A-O-A-S (SEQ ID NO:4) (N-terminal) 

(2) K-A-A-A-S-O-O-A-O-A-O-K- (SEQ ID NO: 5) 

(3) A-O-A-O-A-O-V/H-O-E-A (SEQ ID NO:6) 

(4) S/L-T-A-0-V.A-A-0-T-T-(X)-0- (SEQ ID N0:7) 
15 (5) S-O-P-A-O-A- (SEQ ID N0:8) 

(6) A-A-A-(S)-L-(K)- (SEQIDNO:9) 

and from rose (Komalavilas et al., 1990, supra) 
(A)-D-A-O-A-O-S-O-V (SEQ ID NO: 10) 

Residues in brackets indicate uncertain residues. X = undetermined residue. 

20 Although these amino acid sequences of AGP peptide fragments from carrot, 

ryegrass, and rose are known in the art, AGP genes corresponding to these peptide 
fragments are still not known in the art. All of these art-known plant AGP peptide 
fragments have amino acid sequences that are characterized by a high content of 
hydroxy proline, alanine, serine, and threonine. These amino acid partial sequences 

25 are such that they give GG-rich oligonucleotide primers. For this reason, no one to 
date has been successful in obtaining AGP cDNAs directly from these sequences. 

Initially, attempts were made to obtain plant AGP genes using hydroxyproline- 
rich sequences obtained from isolated AGP fragments. The following sequences were 
utilized unsuccessfully: 



30 (i) N,plumbaginafolia, RT21, FAOS/NGGVALPOS (SEQ ID NO:28) 

I (ii) Kplumbaginafolia, LASOOAOOTADTOA (SEQ ID NO:27) 

(iii) N. plurnbaginafolia, IGAAOAGSOTSSPN (SEQ ID NO:29) 

(iv) Pxommunis. RT16.4, LSOKKSOTAOSOS(S)TOOT(T) (SEQ ID NO:31) 

Each of the sequences (i), (ii) and (iii), which are found in both N, alata and N. 
35 plumbaginafolia AGPs, were used in both N. alata and N. plumbaginafolia to isolate 
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an AGP gene. Sequence (iv) was used to obtain an AGP gene from P, communis. 
None of these sequences led to the isolation of a corresponding gene with methods 
based primarily on DNA hybridization. All of these sequences produced 
oligonucleotide primers that v re highly redundant and very GC-rich (in some cases 
5 greater than 80%). Consequently, a problem appeared that at high stringency, 
hybridization bands were obtained which, on sequencing, had no relationship to the 
amino acid sequence. On examination of the above sequences, it may be seen that all 
four of these sequences are OAST-enriched, i.e., (i) 50%, (ii) 85.7%, (iii) 64,3%. 
and (iv) 84.2%, respectively, (It is noted that in further embodiments of this 

10 invention, these sequences could in fact be utilized for the detection and isolation of 
corresponding genes with methods based on other hybridization principles.) 

The instant disclosure overcomes this problem. Whereas isolated plant AGPs in 
the art have been characterized exclusively by peptide fragments having high 
hydroxyproline or OAST contents (AGP sequences having a low content of 

15 hydroxyproline, or a low OAST content are not available in the prior art), the AGPs 
isolated ai^d described in the present disclosure are characterized not only by peptide 
fragments that are hydroxyproline-rich but also by peptide fragments that are 
hydroxyproline-poor, if not hydroxyproline-deficient. The fact that an AGP peptide 
fragment that was not enriched in hydroxyproline had been isolated and sequenced 

20 and the fact that this sequence, which is also low in hydroxyproline, alanine, serine, 
and threonine content, had been utilized to synthesize degenerate primers, enabled 
circumvention of the problems associated with GC-rich primers and led to the 
isolation of a corresponding AGP cDNA. 

The N-terminal region of an isolated plant AGP can be used to obtain a 

25 corresponding plant AGP gene. In a particular embodiment of the invention, the N- 
terminal region of an AGP peptide obtained from N. alata suspension culture 
comprised a hydroxyproline-poor region. The N-terminal peptide sequence, A-K-S- 
K-F-M-H-P-A-S-X-T-X-A (SEQ ID NO: 11) was used as a template for the synthesis 
of an oligonucleotide primer which was further utilized for the isolation of a 

30 hybridizing AGP gene from both TV. alata and N, plumbaginafolia. 

In other specific embodiments of the invention, hydroxyproline-poor sequences 
from internal regions of AGPs from P. communis suspension culture and from N, 
alata style were used to obtain corresponding AGP genes. For example, in the case 
of P, communis, the AGP backbone encoded by the PcAGP23 gene (SEQ ID NO:49) 
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is hydroxyproline-poor not only at the terminal regions but also internally, and an 
internal sequence (SEQ ID NO:41) was used to obtain a pear AGP gene. Similarly, 
for the N. alata style AGP backbone encoded by the Na351 cDNA clone (Figure 
4F), the N-tenninal region and internal regions have low hydroxyproline contents, 
5 and, internal sequence (SEQ ID NO: 5 8) was used to obtain an N. alata style AGP 
gene. 

This basic approach (strategy A) for obtaining a plant AGP gene enabled the 
successful isolation of AGP genes from A^. alata, N. plumbaginafolia, and P. 
communis cell suspension cultures, as well as from A^. alata styles. In each case, the 

10 cDNA clone comprised a derived amino acid sequence which contained a 
hydroxyproline-poor domain and a hydroxyproline-enriched domain (a region 
enriched in OAST content). 

In strategy B, a method is provided that enables the use of a specific O AST-rich 
AGP peptide sequence for the isolation of a corresponding gene. This method 

15 involves the screening of libraries with RNA probes prepared from a single long 
guessmer (oligonucleotides containing only a subset of the possible codons at each 
position) encoding a desired specific O AST-rich AGP peptide sequence. In order to 
produce an RNA probe from a DNA oligonucleotide, a bacteriophage promoter (e.g., 
T7 or T3 RNA polymerase promoter) is linked at the 5 '-end of the oligonucleotide. 

20 In addition, the oligonucleotide, which is single-stranded, must be converted into a 
partial or complete, double-stranded DNA fragment, because the T7 (or T3) RNA 
polymerase will not recognize single-stranded promoter sequences. Relevant 
procedures for obtaining either DNA or RNA probes from a DNA template are 
known in the art [Berger and Kimmel (1987) Methods in Enzymology 1521 . 

25 Figure lA presents schematically several ways of producing an RNA probe 

involving the use of a single (Figure lA) or double (Figure IB) oligonucleotide 
probe. For example, in Figure lA-1, a second oligonucleotide, which is 
complementary to the guessmer encoding a desired AGP peptide, is synthesized and 
the two oligonucleotides are annealed to form double-stranded DNA. Alternatively, 

30 as shown in Figure lA-2, a short complementary primer is annealed to the promoter 
sequence of the guessmer to form a double-stranded RNA polymerase promoter 
sequence. Using a double oligonucleotide probe approach (Figure IB-l), an adaptor 
sequence (15-18 bp long) is added to the 3 '-end of the guessmer (oligonucleotide 1) 
and a second guessmer (oligonucleotide 2), which encodes 
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a different O AST-rich AGP peptide sequence, with an adaptor sequence 
complementary to the adaptor of the first oligonucleotide, is synthesized. The two 
guessmers are thus annealed through their complementary adaptor sequences and the 
protruding single-stranded regions filled in by primer extension to produce a double- 
5 stranded DNA fragment. Figure lB-2 further demonstrates a method whereby the 
adaptor sequences are designed in such a way that they bind to opposite strands of a 
mediator DNA, enabling the two guessmers to be joined together by a PGR reaction 
to form a double-stranded DNA fragment. 

Single-stranded RNA probes are superior to DNA probes for the screening of 
10 libraries. RNA probes can be labeled to much higher specific activity and bind more 
tightly to a target DNA, thus yielding stronger signals in hybridization reactions. The 
greater stability of hybrids involving RNA enables the use of higher hybridization 
stringency, thus increasing hybridization specificity. Unhybridized RNA probes can 
be removed by RNase digestion further reducing the background. 
15 A single long guessmer (40-70 bp) rather than short degenerate oligonucleotides 

is used to avoid the extremely high degeneracy associated with OAST-rich AGP 
peptide sequences. It is preferred that the guessmer be longer than 40 bp in order 
that the increased stability of hybrids formed by the long oligonucleotide out-weigh 
the detrimental effects of mismatches. Anti-codons GGU, CGU, UGU, and AGU 
20 should be used for Pro (Hyp), Ala, Thr, and Ser, respectively. This is based on the 
consideration that the nucleotide base "A" is the preferred base in the third position 
of codons for Pro, Ala, and Thr. The other consideration is that the nucleotide base 
"U" can pair not only with "A" but also with "G" to some extent, hence GGU can 
pair with CCA or CCG for proline residues, for example. Therefore, it is further 
25 contemplated that antisense RNA rather than the sense RNA probes be used for the 
screening of libraries. 

AGP peptides were isolated from plant cell suspensions by precipitation with 
Yariv reagent (a red dye, 6-glucosyl reagent described by Yariv in 1967). This dye 
was prepared by coupling diazotized 4-aminophenyl glucopyranoside to phloroglucinol 
30 and the reagent was used to precipitate AGPs. The AGPs from suspension-cultured 
cells were prepared by precipitation of AGPs from either the culture medium or from 
the Biopolymer products (the high molecular weight materials precipitated with four 
volumes of ethanol from a cell suspension culture filtrate). An isolation procedure 
independent of the Yariv reagent was also used to obtain AGPs from plant cells. 
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(The Yariv reagent was used later in the isolation procedure to identify fractions 
containing AGPs.) The AGPs from N. alata style extracts were prepared by 
(NH4)2S04 precipitation and further fractionation of the AGP-containing supernatant 
by Mono Q (Pharmacia) anion-exchange chromatography. In a different procedure, 
5 AGPs were initially fractionated by inmiunoafflnity chromatography using the J539 
myeloma antibody (specific for Gal 1-6 B Gal sequences). 

As is known in the art, AGPs can be isolated by several methods, including 
affinity chromatography using, for example, galactose binding proteins, classical 
chromatography, for example, gel filtration, ion-exchange, etc., and also precipitation 

10 by selective reagents, for example, Yariv reagents, lectins, for example, lectins that 
binds galactosyl residues, including but not limited to, tridacnin, peanut agglutinin, 
the Ricinus communis (RCA120) lectins and myeloma protein J539 [Clarke et al. 
(1979) Phytochemistry 18:521-540; Fincher et al. (1983) Ann. Rev. Plant PhysioL 
34:58], or antibodies to specific carbohydrate epitopes [Pennell et al. (1989) J. Cell 

15 BioL 108:1967-1977 and Norman et al. (1990) Planta 181:365-373]. 

AGP fractions were deglycosylated by treatment with trifluoromethane sulfonic 
acid (TFMS) or by treatment with anhydrous hydrogen fluoride (HF). Additionally, 
other methods for separating the protein and the carbohydrate components from each 
other that are known in the art are contemplated by the invention [see Jermyn et al. 

20 (1975) AusL /. Plant PhysioL 2:501]. 

AGPs and AGP fragments, glycosylated or deglycosylated, were separated by 
known separation techniques, for example, SDS-PAGE, HPLC reverse phase 
chromatography, etc. In some cases, the peptides were further fragmented by 
thermolysin digestion before separation. Separated peptides obtained off HPLC 

25 reverse phase and ion-exchange columns were sequenced directly, although in some 
cases the separated peptides were transferred to PVDF membranes for amino acid 
sequencing [Ward et al. (1990) in Electrophoresis 11:883-891]. The use of other 
known proteases, instead of or in addition to thermolysin, is contemplated by this 
invention. Similarly, this invention contemplates the use of other techniques known 

30 in the art for the preparation of pure peptide samples for amino acid sequencing. 
From every source exammed, multiple AGP backbones were observed. 
Multiple backbones were reproducibly obtained whether the AGPs were separated 
first and then individually deglycosylated or whether the whole AGP preparation was 
deglycosylated fu-st and then the individual peptides separated. 
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In a specific embodiment of the invention, total native AGPs were isolated by 
Yariv reagent precipitation from the suspension culture filtrate of N, alata and 
deglycosylated using TFMS. The resulting peptides were separated on a 17.5% SDS- 
PAGE gel and blotted to a PVDF membrane. The major band (MW: 20-30 kD; 

5 Figure IC) was excised and sequenced. An N-terminal peptide sequence, A-K-S-K- 
F-M-M-P-A-S-X-T-X-A (SEQ ID NO: 11), was obtamed. 

In a particular embodiment of the invention, the N, alata AGP N-terminal 
peptide sequence (SEQ ID NO: 11) was used to isolate AGP genes from N. alata and 
N. plumbaginafolia libraries (Figure ID). Degenerate reverse primers corresponding 

10 to part of the AGP N-terminal amino acid sequence, i.e., K-F-M-I-I-P were 

synthesized (Table 1.1) and used to obtain a 160-bp primer extension product (Figure 
IE) which was then amplified by PGR. The 160-bp extension fragment was 
subcloned and sequenced. The nucleotide sequence (SEQ ID NO:21) included a 
derived peptide which matched with the peptide sequence SEQ ID NO: 11 isolated 

15 from N, alata suspension culture. 

Additional primers, corresponding in sequence to parts of the 160-bp fragment 
(e.g., NaFl and NaF2; Figure IE), were synthesized and used to amplify the 3 '-part 
of the AGP gene by nested PGR. A 1.6 kb fragment was amplified and sequenced. 
The alignment of the sequences obtained from the two PGR reactions gave rise to a 

20 DNA sequence of 1679 bp (Figure IF). The PGR fragment encoded a protein which 
contained the isolated peptide sequence (SEQ ID NO: 11) with two mismatches: Arg 
for Ala at position 1 and Pro for His at position 12 (Figure IF). 

The 1.6 kb PGR fragment was used to screen a cDNA library made from 
RNA isolated from N, alata cells in suspension culture and three positive clones were 

25 isolated and sequenced. The alignment of the PGR sequences with the cDNA 
sequence gave rise 
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Table 1.1 

A: Olieo primers used in the primer extension experiments 

Ala Lys Ser Lvs Phe Met lie lie Pro Ala Ser X Thr X Ala (SEQ ID NO: 11) 

(SEQ ID NO: 12) 



GCA 


AAA 


TCA 


AAA 


TTT ATG ATA 


ATA 


CCA 


GCA 


TCA 


ACA 


GCA 


G 


G 


G 


G 


C C 


C 


G 


G 


G 


G 


G 


C 




C 




T 


T 


T 


T 


C 


C 


T 


T 




T 

AGC 
T 








C 


C 


T 

AGC 
T 


T 


C 



Bi Oligonucleotide primers designed 

Group 1 5' GG TAT TAT CAT AAA CTT 3' (SEQ ID NO: 13) 

G G G 
A A 

Group 2 5' GG TAT TAT CAT AAA TTT 3' (SEQ ID NO: 14) 

G G G 
A A 

C: Subgroups of the group 1 primers 

NaRl 5' GG T/G/AAT GAT CAT AAA CTT 3' (SEQ ID NO: 15) 

NaR2 5' GG T/G/AAT AAT CAT AAA CTT 3' (SEQ ID NO: 16) 

NaR3 5' GG T/G/AAT TAT CAT AAA CTT 3' (SEQ ID NO: 17) 

NaR4 5' GG T/G/AAT GAT CAT GAA CTT 3' (SEQ ID NO: 18) 

NaR5 5' GG T/G/AAT AAT CAT GAA CTT 3' (SEQ ID NO: 19) 

NaR6 5' GG T/G/AAT TAT CAT GAA CTT 3' (SEQ ID NO: 2 0) 



A: Amino acid sequence obtained from deglycosylated AGPs isolated 
from N. alata cell suspension culture and the corresponding 
codons . 

B: The two groups of degenerate reverse primers designed for the 

primer extension experiment . 
C : Subgroups of the group 1 primers . 



to a 1700-bp sequence (SEQ ID NO:24) including a poly(A) tail of 7 bp (Figure IF). 
This sequence was designated NaAGPl. Further primer extension experiments 
suggested that the 1.7 kb NaAGPl cDNA (SEQ ID NO:24) represented the full- 
length sequence of the AGP transcript. 
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The NaAGPl cDNA comprised an open reading frame spaiming 1383 
nucleotides. The open reading frame encoded a polypeptide containing 461 amino 
acid residues with a calculated molecular weight of 51.8 kD and a predicted pi of 
3.84. The protein was highly rich in asparagine (25%), and relatively rich in serine 
5 (8.9%), tyrosine (7.5%), proline (7.2%) and glutamine (7.0%) (Table 1.2), and could 
be divided into four domains (Figure IG). At the N-terminus (residues 1-25), there 
was a putative transmembrane helix which was very hydrophobic. 

Table 1.2 



Comparison of derived amino acid composition of NaAGPl and NpAGPl. 



Amino 
acid 


Full 

secpience 
NaAGPl 


(Mol%)^ 
NpAGPl 


Pro -rich 

doxoain 

NaAGPl 


{Mol%)^ 
NpAGPl 


Asn-rich 

domain 

NaAGPl 


(Mol%)^ 
NpAGPl 


Asn 


25 . 0 


26 .2 


4.7 


3 .3 


44 .1 


43 .4 


Ser 


8 . 9 


9 . 8 


8.7 


9.4 


9.8 


10. 3 


Tyr 


7 . 5 


7 . 7 


1 . 3 


1 . 3 


12 . 1 


11,9 


Pro 


7.2 


7 . 9 


20.2 


20.8 


0.0 


0.3 


Glu 


7 . 0 


7.7 


6.7 


6.7 


5.7 


6 . 3 


Gly 


6 . 0 


5.4 


6 . 7 


6.0 


6.0 


5 . 5 


Phe 


5.8 


4 . 7 


6 .0 


6,7 


3 . 8 


3 . 9 


Thr 


5-4 


4.5 


10 . 8 


10.7 


1.5 


1 . 1 


Asp 


3 . 9 


3 . 1 


4 . 7 


5.4 


3.8 


1.9 


Ala 


3 . 5 


4 . 1 


8.7 


8.7 


1.5 


1.5 


Leu 


3.3 


2 . 9 


5.4 


4 . 0 


1.5 


1.5 


Val 


3 . 3 


3 .1 


4 . 7 


4.0 


2.2 


2 . 3 


Gin 


3 . 1 


2 . 9 


2.7 


3.3 


1.9 


1.5 


He 


2.7 


2 . 9 


4 . 0 


4 . 7 


0 . 7 


1.1 


Lys 


2.5 


2 . 5 


2 . 0 


1.3 


2 .2 


3 . 1 


Arg 


1.6 


1.5 


1.3 


1.3 


1.5 


1.5 


Met 


1.2 


1,1 


0.6 


0.6 


0.7 


0.7 


His 


0.8 


0.6 


0.6 


0.6 


0.3 


0 . 7 


Cys 


0.4 


0.0 


0 . 0 


0.0 


0. 0 


0 , 0 


Trp 


0.0 


0 . 0 


0 . 0 


0.0 


0.0 


0 . 0 



1 . The NpAGPl derived amino acid sequence is incomplete as the clone 
is approximately 100 bp short. 

2. The proline-rich domain is defined by amino acid residues 26-173 in 
NclAGPI and 14-161 in NpAGPl. 

3. The Asn-rich domain is defined by amino acid residues 174-436 in 
NaAGPl and 162-412 in NpAGPl. 
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The next one-third of the protein (residues 26-173) was also hydrophobic and 
contained most of the proline (93.8%), alanine (76.5%) and threonine (76.2%) 
residues. These three amino acids accounted for 39.7% of all the amino acids in this 
domain (Pro, 20.2%; Thr, 10.8% and Ala, 8.7%) (Figure IG). This domain is 
5 predicted to be the site of glycosylation by Gal/Ara containing chains, linked through 
hydroxyproline residues. The proline residues (residues 37, 39, 41, and 43 in Figure 
IF) are known to be hydroxylated, as they appear as hydroxyproline (residues 25, 27, 
29, and 31 in Figure IH) in the peptide sequence obtained from deglycosylated 
AGPs of TV. plumbaginafolia. Such hydroxylation and glycosylation would make the 

10 molecule considerably more hydrophilic. 

The portion of the protein corresponding to amino acid positions 174-436 was 
hydrophilic and contained most of the asparagine (95.1%) and tyrosine (94.1%) 
residues which accounted for 44.1% and 12.1%, respectively, of all amino acids in 
this domain (Figure IF and Figure IG). The asparagine residues were distributed in 

15 clusters (residues 2-10) along the polypeptide chain. This domain contained no 

proline residues. The final 25 residues at C-terminus were hydrophilic (Figure IG). 

An N. plumbaginafolia cell suspension cDNA library was also screened with 
the PGR fragment, and four cDNA clones were isolated and sequenced. The four 
clones were identical and contained an insert of 1430 bp (SEQ ID NO:25; Figure 

20 IH). This AGP gene was designated NpAGPl. These cDNAs were incomplete and 
predicted to be about 100 bp shorter at the 5 '-end than the full-length sequence of the 
transcript. The NpAGPl was not identical, but very similar to the NaAGPl at both 
the nucleotide and derived amino acid sequence level (86% and 84.7% identify, 
respectively) (Figure II, Figure IJ, and Table 1.2). The transmembrane helix was 

25 missing in the NpAGPl cDNA due to the incomplete sequence. The difference 

between the two AGP genes was mainly in the middle one-third of the sequence while 
the N-terminal and C-terminal parts were highly conserved (Figure II and Figure IJ). 

The NaAGPl cDNA was cut into a 5 '-half (residues 1-540) corresponding to 
the 5'-nontranslated part, the transmembrane helix and the proline-rich domain and a 

30 3'-half (residues 541-1700) including the asparagine-rich domain, C-terminus, and the 
3'-nontranslated part. These two parts of the cDNA were used separately to probe 
northern blots of RNA [Sambrook et al. (1989) sunral isolated from suspension 
cultured cells of N, alata and N. plumbaginafolia and various tissues of N, alata 
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plants. The two probes gave an identical hybridization pattern, confirming that these 
two distinct domains are parts of the same transcript (Figure IK). The NaAGPl 
cDNA probes hybridized to the RNA samples from all the tissues of N. alata tested, 
although the degree of hybridization and size of transcripts are different in different 
5 tissues. The highest signal was detected in RNA from N. alata suspension cultured 
cells whereas the signal in petals is barely detected. Pollen and style tissues have a 
smaller transcript of approximately 1.0 kb compared with 1.6 kb in A^. 
plumbaginafolia cultured cells and 1.7 kb in all other tissues (Figure IK). Genomic 
southern blot analysis indicated that the AGP gene is a single copy or low copy gene 

10 in the genome of alata. 

In a preferred embodiment of the invention, the cDNA library was screened 
with the labeled synthetic oligonucleotide probe derived from the hydroxyproline-poor 
or the N-terminal AGP protein sequence. In an alternative embodiment of the 
invention, individual recombinants within the cDNA library can be screened for 

15 expression of an antigen (antibody recognition). Procedures for selecting cloned 
sequences from a recombinant cDNA library are described in Kimmel (1987) Meth. 
EnzymoL 152:393-399. 

This invention also contemplates the use of oligonucleotide probes, e.g., AGP 
cDNA, etc., for the detection of hybridizing sequences and the isolation of monocot 

20 and dicot AGP genes. Pear AGP (PcAGP9) transcripts were detected in RNA 
prepared from dicots as well as from a monocot. 

cDNA clones which show a strong hybridization signal are sequenced to 
confirm complimentarity to the AGP amino acid sequence. In addition, the protein 
encoded by the cDNA is shown to possess AGP characteristics. This is done, for 

25 example, by transcribing the clone sequence with an appropriate RNA polymerase, 
then translating the mRNA in, for example, a conmiercially available wheat germ 
extract in vitro translation system. Thus, the identity of a clone is confirmed by 
transformation into a suspension-cultured cell and identifying the product using a 
suitable tag. 

30 In another embodiment of the invention, the presence of AGP protein is 

detected immunologically. For example, antibodies raised to an AGP peptide, or 
fragment thereof, purified and isolated from an SDS-PAGE gel are shown to cross- 
react with the purified AGP peptide. AGP-specific antibodies are also utilized to 
bind and precipitate AGP from plant extracts as well as the product of the cloned 
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AGP gene. Polyclonal and monoclonal antibodies specific to AGP peptide are 
prepared according to standard methods in the art. This type of immunological 
testing is further utilized, for example, for optimization of expression of the cloned 
AGP gene in a recipient organisni. 
5 This invention further contemplates the isolation of a genomic clone of AGP. 

Genomic DNA is isolated according to the methods described by Herrmann and 
Frischauf (1987) Methods EnzymoL 152:180-189. A PCR-based method is used to 
clone a gene from genomic DNA using partial protein sequence (e.g., Aarts et al. 
(1991) Plant MoL Biol. 16:647) or cDNA fragment probes (e.g.. King et ai. (1988) 

10 Plant MoL Biol. 10:401-412). The genomic AGP gene may be utilized instead of the 
cDNA to express AGP, in particular, in host systems where it appears that the native 
promoter or post-translational system is required for fiill expression, e.g., plant cells. 

As is well known in the art [see, for example. Glover (1984) Gene Cloning , 
Brammar and Edidin (eds.), Chapman and Hall, NY], there are various strategies for 

15 generation of cDNA libraries and for the cloning of the cDNA into an appropriate 
DNA recombinant vector, e.g., the pUC family of plasmids or XgtlO or XgtU phage 
vectors. In an embodiment of the invention, a DNA recombinant vector carries a 
constitutive or inducible promoter adjacent to the cloning site such that a transcript is 
made specifically to either strand of the cDNA simply by using different RNA 

20 polymerases. RNAs produced in this way can be used as hybridization probes or can 
be translated in cell-free protein synthesis systems. 

It is understood in the art that modifications may be made to the structural 
arrangement and specific elements of a genetically-engineered recombinant DNA 
molecule described herein without destroying the activity of gene expression. For 

25 example, it is contemplated that a substitution may be made in the choices of 
enhancer regulatory elements and/or promoters [e.g., preferably, an inducible 
promoter (e.g., AdHl)] without significantly affecting the function of the recombinant 
DNA molecule of this invention. It will also be understood that optimization of gene 
expression also results from the use of preferred codons, the arrangement, 

30 orientation, and spacing of the different regulatory elements as well as the multiple 
copies of a particular element with respect to one another, and with respect to the 
position of the TATA box, as will be apparent to those skilled in the art using the 
teachings of this disclosure. 
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In another embodiment of the invention, AGPs were isolated from N, 
plumbaginafolia suspension cultures. The medium from the cell suspension culture of 
N, plumbaginafolia was separated from the cells by filtration and the high molecular 
weight materials precipitated with four volumes of ethanol. The total native AGPs 
5 were purified from the Biopolymer product by precipitation with the Yariv reagent 
after depleting the starting material of pectins by CTAB (hexadecyl trimethyl 
ammonium bromide) precipitation prior to Yariv precipitation. The total native AGPs 
were treated by two paths: 

Path 1 : Deglycosylation followed by reverse phase HPLC firactionation 
10 before direct sequencing, or sequencing after enzymatic (proteolytic) digestion 
[detailed in Example 2(c)2-5]. 

Path 2: Reverse phase HPLC fractionation followed by deglycosylation and 
further reverse phase HPLC fractionation [detailed in Example l(c)6-8]. 

Path 1 (deglycosylation followed by separation of AGPs) produced an 
15 unbound peak and two major bound peaks, RT21 and RT32, with retention times of 
21 min and 32 min, respectively, in reverse phase HPLC (see Figure 2A). Peak 
RT21 was digested with thermolysin and refractionated by RP-HPLC prior to amino 
acid sequencing. The sequences (SEQ ID NOS: 26-29) obtained from peak RT21 
exhibited a high content of hydroxyproline, alanine, serine, and threonine (OAST-rich 
20 sequences). 

Peak RT32 was sequenced directly and gave the sequence R-K-S-K-F-M-I-I- 
P-A-S-O-T-O-A-O-T-O-I-N-E-I-S-F (SEQ ID NO:30) which, at the 5'-end, very 
closely matched the N-terminal sequence (SEQ ID NO: 11) obtained from N. alata 
cell cultures, and which did not show a high content of hydroxyproline nor of OAST, 
25 i.e., hydroxyproline, alanine, serine, and threonine. The 3'-end of the peak RT32 
sequence (SEQ ID NO:30) comprised a domain characterized by a high OAST 
content. 

The results of amino acid analyses of chromatographic fractions from N. 
plumbaginafolia AGPs are presented in Table 2.1. All AGPs that initially bound to 
30 the chromatography columns showed an enrichment in hydroxyproline, alanine, 
serine, and threonine residues. 

In another embodiment of the invention, the total native AGPs were isolated 
from Pyrus communis (pear) Biopolymer by Yariv precipitation. 
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The AGPs were either deglycosylated first and then separated by reverse 
phase HPLC (RP-300) (Path 1), or alternatively, the total native AGPs were 
fractionated first by reverse phase HPLC (RP-300), and then deglycosylated, digested 
with thermolysin, and purified for sequencing (Path 2). 
5 Path 1 (HPLC separation of deglycosylated AGPs) gave the profile shown in 

Figure 3 A. The results of amino acid analyses of major peaks (i.e., unbound, peak 
RT16.4 and peak RT18.2), as summarized in Table 3.1, indicated enrichment of 
hydroxyproline, alanine, serine, and threonine residues in the bound fractions. The 
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RT16.4 and the RT18,2 peaks from Figure 3A were subjected to thermolysin 
digestion and the digestion products were separated on a RP-300 column. The RP- 
300 profile for digested RT16.4 is shown in Figure 3B and for RT18.2 is shown m 
Figure 3C. 

5 In all, only one peak (peak 1 of thermolysin-digested RT16.4, Figure 3B) was 

a pure peptide and gave a clear sequence, L-S-O-K-K-S-O-T-A-0-S-O-S-(S)-T-O-0- 
T-(T) (SEQ ID NO:31), which showed a high content of alanine, hydroxyproline, 
serine, and threonine. Peaks 3 and 5 of RT16.4 (Figure 3B) comprised sequences 
(SEQ ID NO: 11 and SEQ ID NO: 12, respectively) that also exhibited high contents 

10 of hydroxyproline, alanine, serine, and threonine. 

Peaks from thermolysin-digested RT18.2 (Figure 3C) were resolved into 
several peaks (SEQ ID NOS:31, 34-38). These sequences also were characterized by 
a high OAST content. 

Path 2 (fractionation of the total native pear AGP fraction by reverse phase 

15 HPLC) gave the profile presented in Figure 3D. Peak RT7.8 and the unbound 
fraction were analyzed for amino acid composition and both were found to be 
enriched in hydroxyproline, alanine, serine, and threonine as shown in Table 3.1. 
Peak RT7.8 and the unbound fraction were deglycosylated and fractionated on HPLC. 
The profile for the deglycosylated Peak RT7.8 (Figure 3E) showed a major peak 

2 0 (Peak RT23) which, after thermolysin digestion and further purification on reverse 

phase HPLC (RP-300), gave six peptide sequences. Five sequences (SEQ ID 
NOS:39-44) were O AST-enriched, whereas one of the sequences, L-V-V-V-V-M-T- 
P-R-K-H (SEQ ID NO:41) was also present in sequence obtained by direct 
sequencing of the native AGP in RT7,8. 
25 The unbound fraction of Figure 3D, after deglycosylation and further 

fractionation on HPLC (Path 2), gave the profile presented in Figure 3F. The major 
peaks RT16-19 in Figure 3F [obtained by Path 2 (separation followed by 
deglycosylation)] had retention times similar to those of peaks RT16-19.9 in Figure 
3 A [obtained by Path 1 (deglycosylation followed by separation)]. 

3 0 It would appear from Figure 3D that Peak RT7.8 represents about 27% of the 

total AGPs from pear. At least four N-terminal were observed in one fraction which 
may represent multiple chains. The unbound fraction represents about 67% of the 
total AGPs from pear and gives peaks which correspond to the RT16.4-19.9 of Figure 
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3 A which gave several OAST-enriched sequences. Thus, the invention provides 
amino acid sequence data from each of the two major AGPs from Pyrus communis. 



In a particular embodiment of the invention, an AGP gene was obtained from 
P. communis. 

5 The sequence L-V-V-V-V-M-T-P-R-K-H (SEQ ID NO:41), which was 

hydroxyproline-poor and OAST-poor, was selected as template for obtaining an AGP 
gene from pear cell suspension culmre, 

A number of primers corresponding to the L-V-V-V-V-M-T-P-R-K-H 
sequence (SEQ ID NO:41) was designed and synthesized for PGR experiments (Table 
10 3.2), 



TABLE 3.2 

Sequences of the oligonucleotide primers used in PGR 



15 Peptide sequence L-V-V-V-V-M-T-P-R-K-H (SEQ ID NO:41) 
Primer designation: 

PCA23F1 5' GTN GTN GTN GTN ATG AC 3' (SEQ ID NO:45) 

PcA23F2a 5' GTA GTN ATG ACN CCN AGA AA 3' (SEQ ID NO:46) 

G 

20 PcA23F2b 5' GTA GTN ATG ACN CCN CGN AA 3' (SEQ ID NO:47) 



N = A,T,G or C 

The same nested PGR procedure used for the cloning of the NaAGPl gene 
(Figure ID) was used to clone the gene encoding the above peptide, except that the 
25 annealing temperature was 52*'C in this case. A 350-bp fragment was amplified after 
two successive PGR reactions using the PcA23Fl as the first primer and the 
PcA23F2a as the second primer. The fragment was sequenced and found to encode 
the correct peptide sequence (SEQ ID NO:48; Figure 3G). 

The PGR fragment was used to screen a cDNA library made from mRNA 
30 from pear cell suspension culture, as described above for N. alata cell suspension. 
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One positive clone (PcAGP23) was isolated and sequenced. This clone contained an 
insert of 760 bp and matched the PGR sequence. 

The PcAGP23 cDNA (SEQ ID NO:49) encodes an open reading frame, which 
starts with an initiation codon (ATG) at position 20 and ends with a termination 

5 codon (TAG) at position 560 (Figure 3H). The open reading frame encodes a 

polypeptide containing 180 amino acid residues with a calculated molecular weight of 
19.2 kD and a predicted pi of 8.46. The predicted amino acid sequence contains the 
peptide sequence, L-V-V-V-V-M-T-P-R-K-H (SEQ ID NO:41), which was used for 
the cloning of the PGR fragment. In addition, another peptide sequence, L-G-I-S-O- 

10 A-O-S-O-A-G-E-V-D-(G) predicted from nucleotides 428-472, matches SEQ ID 
NO:34 obtained from RT18.2 (Figure 3G). However, odier sequences from peak 
RT7.8 (SEQ ID NOS: 39-44) are absent from the PcAGP23 sequence, indicating they 
are from different AGP backbones. 

The most abundant amino acid residues in the predicted protein sequence are 

15 Ser (12.2%), Gly (10.5%), Leu (9.4%), Val (8.8%), Ala (7.2%) and Lys (7.2%) 
[Table 3.3]. 
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Table 3.3 

Amino acid composition of the predicted PcAGP23 protein 



MoI% 

Amino Acid 

+SP -SP 



Ser 


12.2 


9.8 


Gly 


10.5 


11.1 


Leu 


9.4 


7.8 


Val 


8.8 


9.1 


Ala 


7.2 


6.5 


Lys 


7.2 


7.8 


Thr 


5.5 


6.5 


Pro 


5.5 


6.5 


Glu 


5.0 


5.2 


Phe 


4.4 


2.6 


Asp 


3.8 


4.5 


Asn 


2.7 


3.2 


Tyr 


2.7 


2.6 


Arg 


2.7 


3.2 


He 


2.7 


3.2 


Gin 


2.2 


2.6 


Trp 


2.2 


2.6 


Cys 


1.6 


1.9 


His 


1.6 


1.9 


Met 


1.1 


0.6 



-hSP: The putative secretion signal peptide is included, 
-SP: The putative secretion signal peptide is excluded. 

The PcAGP23 contains 5.5% Pro residues, some of which are post-translationally 
modified to hydroxyproline, as identified by peptide sequencing. Relatively speaking, 
the Pro and Ala residues are concentrated in the last 1/3 of the sequence (at the C- 
tenninus). 

5 In the sequence of the PcAGP23 cDNA (SEQ ID NO:49), there is a putative 

secretion signal at the N-terminus (1-27) with a potential cleavage site between Ala^^ 
and Arg^^ There are also two potential N-glycosylation sites at amino acid positions 
36 and 87 (Figure 3H). 
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In another embodiment of the invention, the AGPs in a pear cell culture 
filtrate were further purified as illustrated in the flow chart of Figure 5D. The 
unbound fraction and the two minor bound fractions (Figure 5D-A), which accounted 
for 72%, 0.9% and 0.1%, respectively, of total AGPs loaded on the column, were 

5 purified as described above and in Example 3(a). 

The major peak of Figure 5D-A, which accounted for approximately 27% of 
the AGPs, was collected and reapplied to the same colunm. Upon elution with a 
shallow gradient, two peaks (Fractions 1 and 2) were resolved (Figure 5D-B). The 
AGPs in Fraction 1 were described above and in Example 3(a). 

10 Size-exclusion FPLC fractionation of Fraction 2 resolved two components 

(peaks 2 A and 2B, Figure 5D-C3). Arabinose and galactose were the major 
monosaccharides of each fraction (Table 3.4). 



Table 3,4 
Linkage Analysis of AGP fractions 



Monosaccharide 

and deduced 
linkage (mol%) 


Unbound 
fraction 
(Fig. 5D-C1) 


Fraction 1 
(Fig. 5D-C2) 


Fraction 2 
Peak 2A Peak 2B 
(Fig. 5D-C3) 


Ara/": terminal 


34 


36 


24 


18 


3- 


3 


3 


4 


4 


5- 


2 


3 


1 


1 


Ga]p: terminal 


7 


8 


12 


14 


3- 


5 


4 


8 


5 


6- 


10 


10 


8 


23 


3,6- 


38 


36 


44 


35 



Ara/: Arabinofiiranose; Ga]p: Galactopyranose 



Arabinose was present mainly in the terminal position with small amoimts of 3-linked 
and 5-Iinked residues. Galactose was present mainly as 3,6-linked and terminal 
residues in both peaks. However, the proportion of 6-linked galactosyl residues was 
greater in Peak 2B than 2A, and both had small proportions of 3-linked residues. 
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Amino acid composition analysis of the AGPs in Peaks 2A and 2B are shown in 
Table 3.5. N-terminal amino acid sequencing of material in Peak 2B gave the 
sequence A-E-A-E-A-X-T-X-A-L-Q-V-V-A-E-A-X-E-L (SEQ ID NO:74). 

AGPs in Peaks 2A and 2B were separately deglycosylated and the resulting 
5 protein backbones isolated by size-exclusion FPLC (Fig. 5D-D1-4). The apparent M, 
of the proteins was different for each fraction. Peak 2B gave one protein backbone 
(Mr 10k), Peak 2 A resulted in two protein peaks (M^ 10k and 54k). The 10k protein 
backbone in Peak 2 A is a contamination from Peak 2B. N-terminal amino acid 
sequencing of the 54k protein backbone gave the sequence T-O-A-O-A (SEQ ID 
10 NO:75) while the 10k protein backbone in Peak 2B gave the sequence A-E-A-E-A-O- 
T-O-A-L-Q-V-V-A-E-A-O-E-L (SEQ ID NO:76). The latter sequence is identical to 
the N-terminal sequence obtained 

from the AGP in Peak 2B before deglycosylation, assuming the unassigned residues 
"X" are Hyp. 



15 The amino acid compositions of the 54k and 10k protein backbones are very 

similar to that of their parent AGPs in Peaks 2 A and 2B, respectively. The 54k 
protein backbone contained a higher proportion of Hyp (27.5%), Ser (18.4%) than 
the 10k protein backbone in Peak 2B (Hyp, 19.5%; Ser, 6.0%). On the other hand, 
the 10k protein backbone had a higher content of Glx (14.3%) and Val (10.1%) than 

20 the 54k protein backbone in Peak 2A (Glx, 6.6%; Val, 4.2%) (Table 3.5). The 10k 
and 54k protein backbones were digested separately with thermolysin and the 
resulting peptides purified by RP-HPLC for sequencing. 

Sequences of eight peptides were obtained from the 54k protein Peak 2A and 
three from the 10k protein in Peak 2B (Table 3.6). Two of the three sequences and 
25 the N-terminal sequence overlap to give a sequence A-E-A-E-A-O-T-O-A-L-Q-V-V- 
A-E-A-O-E-L-V-O-T-O-V-O-T-O-S-Y (SEQ ID NO: 88) for the 10k protein in Peak 
2B. 
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Table 3.6 

Peptide sequences obtained from Peaks 2A and 2B 



Peak 1 


Peptide sequence 




Peak 2A 


T-O-A-O-A (N-terminal) 

V-S-X-O-V-Q-S-O-A-X-O 

V-X-X-O-V-Q-S-O-A-S-O-O-O-T-T 

l-S-O-A-S-T-O-O-T- 

I.S-O-A-S-T-O-O-T-O-A-S-O-O-T 

F-S-O-T-l-S-O-A 

X-A-(A)-T-0-S-L-D-V-G-l-0-S-S-N-A-T 

T/P-S-O-A-T-O-O-A-T 

X-A-A-0-A-0-S-{0)-X-P-T-(N)-T 


(SEQ ID NO:75) 
(SEQ ID NO:77) 
(SEQ ID NO:78) 
(SEQ ID NO:79) 
(SEQ ID N0:80) 
(SEQ ID NO:81) 
(SEQ ID NO:82) 
(SEQ ID NO:83) 
(SEQ ID NO:84) 


Peak 2B 


A-E-A-E-A-X-T-X-A-L-Q-V-V-A-E-A-X-E-L (N-terminal)* # 
A-E-A-E-A-O-T-O-A-L-Q-V-V-A-E-A-O-E-L (N-terminal)*' # 
V-V-A-E-A-0-E-L-V-0-T-0-V-0-t-0-S-# 
L-V-O-T-O-V-O-T-O-S-Y # 
Y-T-E-R- # 


(SEQ ID N0:74) 
(SEQ ID NO:76) 
(SEQ ID NO:85) 
(SEQ ID NO:86) 
(SEQ ID NO:87) 



Note: All the residues of ambiguous assignments are shown, uncertain residues are in 
brackets. "X" indicates no signal or an unknown residue. "O" represents 
hydroxyproline. Sequences included in the cDNA are marked #. 



Obtained from the AGP in Peak 2B before deglycosylation. 

Obtained from the deglycosylated protein backbone of the AGP in Peak 2B. 



In another embodiment of the invention, AGPs were isolated from N, alata 
styles. In this example, the total native M alata style AGPs were not purified by the 
Yariv reagent precipitation technique, but by ion exchange chromatography (lEC) 
followed by gel filtration chromatography (GFC). The presence of AGP in colunm 
5 fractions was verified by precipitation of AGP with a Yariv reagent. The AGPs were 
then deglycosylated by HF and fractionated by reversed phase HPLC. 

Two major peaks: RT25 and RT35 (Figure 4C) were obtained after 
deglycosylation and HPLC fractionation. Amino acid analysis of each fraction and 
the native materials are shown in Table 4.1. 
10 Distinct differences are apparent in the amino acid composition between the 

three fractions. The unbound fraction contains little Hyp but is rich in Gly, Glx, Ser 
and Asx. The RT35 fraction is also Hyp-poor but rich in Asx, Glx and Ala. 
Together, these two fractions account for the bulk of the Asx and Glx detected in the 
native and deglycosylated AGPs. The amino acid composition of the material in 
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fraction RT25 is dominated by Hyp (18%), Ala (20%) and Ser (15%) with very little 
Tyr. This RT25 protein backbone was thus selected for further analyses. 
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Peak RT25 gave four sequences (SEQ ID NOS: 50-53) which are OAST- 
enriched. Three of these sequences (SEQ ID NOS: 50, 51, and 52) closely matched 
SEQ ID NOS: 27-29, respectively, for RT21 from N. plumbaginafolia. 

An N-terminal sequence was not obtained for the RT25 peak. Pyroglutamate 
5 aminopeptidase was then used to remove the N-terminal blocked pyroglutamate 
residue and the sequence Ala-Hyp-Gly was obtained. The RT25 backbone was also 
fragmented by treatment with the endoproteinase thermolysin. The resulting peptides 
were separated and further purified by RP-HPLC. Six major peptides (Figure 41) 
were subjected to amino acid sequencing and four sequences were obtained (SEQ ID 
10 NOS: 50, 51, 53, 67). All the sequences were rich in Hyp. Ser and Ala (33 of 52 
amino acid residues). 

Endoproteinase Asp-N was also used to cleave the RT25 protein backbone at 
the Asp residues. Two major peptides were produced (Al and A2; Figure 4J) 
indicating that there is only one Asp residue in the RT25 protein. The cleavage was 
15 incomplete as indicated by the presence of the starting material (RT25 protein; Figure 
4. J). Peptide sequence was obtained for A2 (SEQ ID NO: 68). The other peptide 
(Al) gave no sequence data, indicating a blocked N-terminal residue. Overlaps were 
identified between A2 (SEQ ID NO:68) (Figure 4J) and Peak 3 (SEQ ID NO:51) 
(Figure 41) and gave a continuous amino acid sequence of 26 residues: 
20 LASOOAOOTADTOAFAOSGGVALPOS (SEQ ID NO:69), 

Peak RT35 gave four sequences (SEQ ID NOS: 54-57) which had a low OAST 
content. Three of these sequences (SEQ ID NOS: 55-57) were characterized by the 
sequence T-A-I-N-T-E-F-G-P (SEQ ID NO:58). 

In an alternative method of preparation, N, alata style AGPs were isolated 
25 accordmg to Bacic et al. (1988) Phytochem. 27:679-684. The sequence A-V-F-K-N- 
K-X-X-L-T-X-X-P-X-H (SEQ ID NO:59) was obtained. 

In other embodiments of the invention, AGP genes from N, alata style were 
isolated. The cloning strategy of Figure 4D was used to obtain the genes. Several of 
the peptide sequences of peak RT35 isolated from N. alata style contained the 
3 0 sequence T-A-I-N-T-E-F-G-P (SEQ ID NO: 58). In a specific embodiment, gene- 
specific degenerate oligonucleotide primers were designed based on the sequence A-I- 
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N-T-E-F-G (SEQ ID NO:60) and a PGR fragment was amplified in vitro from style 
RNA ofN. alata, A 380-bp PGR fragment (SEQ ID NO:62; Figure 4E) was used to 
screen a style cDNA library and a cDNA clone was isolated and fully sequenced. 
The N. alata style cDNA clone was designated Na35_l . The insert of the cDNA 
5 clone was 800 bp in length with a poly (A) tail at the 3 '-end. The cDNA sequence 
(SEQ ID NO:63) matched the PGR sequence except that it was 3 bp shorter at the 3'- 
end (Figure 4E and Figure 4F). 

The Na35_l sequence (SEQ ID NO:63) had an open reading frame starting 
with an initiation codon (ATG) at position 21 and ending with a termination codon 
10 (TAA) at position 530 (Figure 4F). The open reading frame encoded a polypeptide 
containing 169 amino acid residues with a calculated molecular weight of 19.5 kD 
and a predicted pi of 8.1. The most abundant residues in the sequence were: proline 
(11.2%), phenylalanine (9.5%), alanine (7,7%), leucine (7.7%), and lysine (7.7%) 
(Table 4.2). 

15 The amino acid sequence derived from the N. alata style cDNA (SEQ ID 

NO:63) comprised regions that matched peptide fragments of peak 35 isolated from 
A^. alata styles, i.e., SEQ ID NO:55, SEQ ID NO:56, and SEQ ID NO:57. Northern 
blot analyses of the Na35_l gene (Figure 4G) indicated a specificity of the gene to N. 
alata and to style tissue. Signals were not detected in transcripts from tomato style, 

2 0 N, alata cell suspension, A^. plumbaginafolia cell suspension, and pear cell suspension 
(Figure 4H), mdicating that the Na35_l PGR fragment was specific for an A^. alata 
style AGP gene. 

Further, the isolation of a different gene for an A^. alata style AGP is 
described in another specific embodiment of the invention. The five peptides isolated 

25 from fragments of the AGP protein backbone (SEQ ID NOS: 50,5 1,53, 67 and 68) 
together gave 52 amino acid residues. Much of the sequence contained adjacent 
residues of Hyp, Ser and Ala for which the codons are highly redundant and GC-rich. 
However, the sequence TADTOAF from the continuous 26 amino acid sequence 
resulting from the overlaps of the isolated peptides contained two amino acids which 

30 are not GC-rich and only have two degenerate codons. This TADTOAF sequence 
allowed design of an oligonucleotide suitable for PGR and the eventual cloning of the 
AGPNal 1 cDNA. 

A gene-specific oligonucleotide (20 nucleotides) was designed from one region 
of the continuous 26 amino acid sequence: TADTOAF (SEQ ID NO: 70). Inosine 
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was used at the third position of the first two codons to reduce the degeneracy of the 
oligonucleotide to 128. The resulting oligonucleotide contained 55% GC. cDNA 
was synthesized from total style RNA using poly T linked with an adaptor sequence. 
Rapid amplification of the cDNA 3' end (3* RACE) was performed using the gene- 
5 specific primer together with a 3' primer in the adaptor sequence. A PGR fragment 
of 400 base pairs (bp) was produced. This PGR fragment was cloned and sequenced. 
The deduced 



Table 4.2 

Amino acid composition of the predicted Na35_l polypeptide 
and purified RT35 peptide peak 



Amino acid Na35_l cDNA RT35 peptide 



Asn/Asp 


11.8 


14.5 


Thr 


3.0 


6.7 


Ser 


5.9 


5.5 


Gln/Glu 


7.1 


12.4 


Pro/Hyp 


11.2 


7.8 


Gly 


1.8 


4.3 


Ala 


7.7 


10.8 


Val 


2.4 


4.2 


1/2 Cys 


4.7 


1.1 


Met 


2.4 


3.0 


He 


7.7 


4.8 


Leu 


7.7 


6.6 


Tyr 


3.0 


3.1 


Phe 


9.5 


4.2 


Trp 


1.8 


ND 


Lys 


6.5 


5.3 


His 


1.8 


2.1 


Arg 


4.1 


3.4 



N . D . : not determined 



amino acid sequence from this PGR clone matched isolated AGP sequences, i.e., 
SEQ ID NOS: 50, 51, 53, 67, 68. 
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The PGR clone was then used as a probe to screen a style cDNA library 
(300,000 plaques). Two cDNA clones were obtained which differ only in the length 
of the 3' and 5' ends. One of the clones, designated AGPNal 1 (Figure 4K), was 
used for all subsequent analyses. The 3' end of the AGPNal 1 cDNA clone was 
5 identical to the PGR clone except that the PGR clone was 20 bp shorter and contained 
a poly A tail. The 712-bp AGPNal 1 clone encodes a putative protein of 12.5 kD 
(Figure 4K). The derived amino acid sequence includes sequences identical to 
isolated AGP peptides (SEQ ID NOS:50, 51. 53, 67 and 68). Most of the proline 
residues in the peptide sequences obtained by amino acid sequencing are 

10 hydroxylated. A secretion signal peptide is predicted (Figure 4K and 4L). The 

deduced N-terminus of the mature protein (10 kD; pi 6.8) is Gln-Ala-Pro-Gly which 
matches the N-terminal sequence data obtained. The Pro residue in the N-terminal 
sequence is also hydroxylated. The amino acid composition of the deduced mature 
protein and the isolated RT25 protein backbone are in general agreement (Table 4.1). 

15 The G-terminus of the deduced protein is very hydrophobic and predicted to be a 
transmembrane helix. 

The cDNA clone obtained (Figure 4K) predicts a 132 amino acid protein 
characterized by hydrophobic stretches at both the N-and C-termini (Figure 4L). The 
N-terminal hydrophobic sequence corresponds to a signal peptide which would lead to 

20 secretion of the encoded protein. This is consistent with the known secretion and. 
extracellular localization of the style AGPs [Sedgley et al. (1985) Micron Microscop. 
Acta 16:247-254]. Modification of the N-terminal residue, Gin, by mtra-molecular 
cyclization to form pyroglutamate is not unusual. The cyclization could occur during 
purification, or it could occiu: in situ and might be involved in the stabilization of the 

25 AGP backbones. The same N-terminal sequence: Gln-Ala-Pro-Gly- Ala is also 
present in the AGP backbone isolated from pear (Figure 5 A). The G-terminal 
hydrophobic sequence is predicted to be a transmembrane helix (Figure 4L) which 
might anchor the AGP in plasma membrane. The hydrophobic G-tenninal region 
could also potentially enable the interaction of the AGP with other proteins, such as 

30 S-RNase which also contains a very hydrophobic sequence (in this case at the N- 
terminus of the mature protein; Mau et al. (1986) Planta 169:184-191. The central 
part of the protein contains most of the Hyp, Ala, Ser residues. The fact that most of 
the Pro residues within the peptide sequences are hydroxylated suggests extensive O- 
glycosylation in the central part of the protein. No potential N-glycosylation sites are 
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present. The abundance of potential O-glycosylation sites is consistent with the high 
content of carbohydrate (85% w/w). Individual AGPs may differ in the types of 
saccharide chains and in the number and location of glycosylation sites along the 
protein backbone. 

5 raRNA hybridizing to AGPNal 1 cDNA is present in most tissues of N. alata 

and in the styles of related solanaccous species (Figures 4M), suggesting a general 
role of this transcript (or closely related transcripts) in plant development. Various 
tissues from N. alata were examined for the expression of the AGPNal 1 gene. As 
shown in Figure 4M-1, mRNA transcripts of similar length of about 700-750 
10 nucleotide were detected in all tissues examined. This suggests that the AGPNal 1 
gene or its homologs are expressed in many parts of the plant. Style, ovary, petal, 
leaf and stem have similar levels of transcript, but the highest level of mRNA 
expression is found in roots. 

Some expression of hybridizing transcript was detected in the styles of N. 
15 sylvestris and N. tabacum and a lower level in N, glauca and Lycopersicon 

peruvianum (Figure 4M-2). Arabidopsis and rye grass (Lolium perenne^ a monocot) 
leaves had no detectable hybridizing transcript. 

In another embodiment of the invention, an AGP gene was isolated from P. 
communis using a guessmer oligonucleotide sequence encoding a hydroxyproline-rich 
20 pear AGP segment and linked to a double-stranded promoter sequence for RNA 
polymerase, allowing the synthesis of an antisense RNA probe (see Figure lA) 
(strategy B). Strategy B thus enabled the isolation of an AGP gene (SEQ ID NO: 66) 
that specifically encodes a particular hydroxyproline-rich peptide segment (see Figure 
5A). Hydroxyproline-rich and O AST-rich domains appear to represent characterizing 
25 features of AGPs. 

AGP peptide fragments were isolated and sequenced essentially as described in 
Example 3(a). The sequence A-K-S-O-T-A-T-O-O-T-A-T-O-O-S-A-V (SEQ ID 
NO:37) of an isolated pear AGP fragment exhibited hydroxyproline-enrichment and 
OAST-enrichment. This sequence was selected for the isolation of a corresponding 
30 pear AGP gene. The codon usage for proline is strongly biased towards CCA which 
accounts for 73.3% of all proline codons; the codon for alanine is biased, to a lesser 
extent, to CCT (44.8%); there is no significant bias in codon usage for other amino 
acids. 
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Two hybrid oligonucleotides (AF1T3 and AR2T7), each comprising a GC- 
enriched sequence encoding a hydroxyproline-rich AGP segment, were constructed as 
primers. The sequences of primers AF1T3 and AR2T7, each comprising a GC-rich 
domain, are presented in Table 5.1. AF1T3 (SEQ ID NO:64) includes a T3 
5 promoter sequence, a 42-bp GC-enriched nucleotide sequence corresponding to an 
isolated N, plumbaginafolia AGP peptide fragment (SEQ ID NO:27), that is OAST- 
enriched, and an 

18-bp sequence corresponding to position 150-167 from the NaAGPl (SEQ ID 
NO:24). The AR2T7 primer (SEQ ID NO:65) consists of a T7 promoter, a 47-bp 

10 GC-enriched nucleotide sequence corresponding to a hydroxyproline-rich (O AST- 
enriched) AGP sequence from pear (SEQ ID NO:37) and another 18-bp sequence 
corresponding to position 444-461 from the NaAGPl cDNA (SEQ ID NO:24). 

An antisense RNA probe was synthesized from the guessmer oligonucleotide 
template by using T7 polymerase, and was used to screen a cDNA library prepared 

15 from pear cell suspension culture essentially as described in Example 3(b). Three 
cDNA clones were isolated and sequenced. The sequence of the longest clone 
PcAGP9 (SEQ ID NO:66) is shown in Figure 5A. The cDNA clone contains an 
insert of 893 bp and 
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Table 5.1 



Nucleotide sequences of the primers AF1T3 and AR2T7 



AF1T3 : (Forward primer) (SEQ ID NO:64) 

N- terminus » > > >»» >» > > C- terminus 
T3 promoter ATOOAOOTADTPA 
5 ' TGTTATTAACCCTCACTAAA GCATCACCACCAQCACCACCAACAGCAGACACACCAGCAG 

Nucleotide 150-167 
of the NaAGPl cDNA 
CT ATGATCATACCTGCATCT 3 ' 

AR2T7 (Reverse primer) (SEQ ID NO: 65) 

C- terminus N- terminus 

T7 promoter ASOOTATOOTATO 
5 ' NCTAATACGACTCACTATA GGCTGATGGTGGTGTTGCTGTTGGTGGTGTTGCTGTTGGT 

Nucleotide 444-461 
T K A of the NaAGPl cDNA 
GATTTTGCGGGAGTATCAGTCAAAAG3 ' 



Promoter sequences are underlined once. Sequences from NaAGPl cDNA are 
double underlined. 



encodes an open reading frame of 145 amino acid residues. There is a putative 
secretion signal peptide at the N-terminus. The predicted polypeptide is highly rich 
in Pro, Ala, Ser, and Thr (Table 5.2) and contains two sequences which match 
exactly two peptide sequences obtained previously from pear AGPs by protein 

5 sequencing: AKSOTATOOTATOOSAV (SEQ ID NO:37) and 

VTA0TOSAS0OSST0A(S)TXA (SEQ ID NO:38). The PcAGP9 sequence (with the 
secretion signal included) gave an estimated pi of 10.79 and an apparent molecular 
weight of 13.622 kD. The PcAGP9 sequence (excluding the secretion signal) gave an 
estimated pi of 11.07 and an apparent molecular weight of 11.238 kD. 

10 As illustrated in the hydropathy profile of Figure 5C, the cDNA has three 

domains, an N-terminal hydrophobic sequence encoding a secretion signal, a central 
hydrophilic domain containing most of the proline residues and a hydrophobic C- 
terminal domain which is predicted to be a transmembrane helix. The N-terminus of 
the mature protein corresponds to the sequence predicted from processing of the 

15 secretion signal. The proline residues within the central region are mainly 
hydroxylated and would bear the 
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Table 5,2 



Amino acid composition of the PcAGP9 sequence 





+ Secretion Signal 


-Secretion Signal 


Amino acid 


No. 


Mol% 


No. 


Mol% 


Pro 


30 


20.6 


30 


24.5 


Ala 


29 


20.0 


26 


21.3 


Ser 


25 


17.2 


24 


19.6 


Thr 


17 


11.7 


16 


13.1 


Val 


8 


5.5 


6 


4.9 


Gly 


8 


5.5 


5 


4.1 


Leu 


6 


4.0 


2 


1.6 


He 


5 


3.4 


4 


3.2 


Lys 


4 


2.7 


3 


2.4 


Phe 


4 


2.7 


2 


1.6 


Met 


3 


2.0 


0 


0.0 


Gin 


2 


1.3 


1 


0.8 


Asn 


1 


0.6 


1 


0.8 


Asp 


1 


0.6 


1 


0.8 


Arg 


1 


0.6 


1 


0.8 


Cys 


1 


0.6 


0 


0.0 


Glu 


0 


0.0 


0 


0.0 


His 


0 


0.0 


0 


0.0 


Tyr 


0 


0.0 


0 


0.0 


Trp 


0 


0.0 


0 


0.0 



gly cosy 1 chains, A cDNA encoding the protein backbone of an AGP from the styles 
of Nicotiana alata, has three domains with similar characteristics. Although the 
amino acid composition of the proteins encoded by these cDNAs is similar, the only 
common sequence is at the N-terminal sequence of the mature proteins, Q-A-P-G-A- 
5 A (SEQ ID NO: 73). The cDNAs encode protein backbones of single AGPs from 

several present in the plant extracts which are quantitatively a minor part of these proteoglycans. 
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The central part (amino acids 24-123) of the sequence is dormnated by four 
amino acids (Pro, 29%; Ala, 19%; Set, 23% and Thr 15%). The dominant feature 
of this part of the sequence is that the four residues are interspersed with each other; 
there are no obvious motifs and few runs of any single amino acid. There are no 

5 predicted N-glycosylation sites. 

The C-terminal region of 22 amino acid residues is very hydrophobic and is 
predicted to be a transmembrane helix [Eisenberg et al. (1984) 7. MoL BioL 179:125- 
142; Klein et al (1985) Biochem Biophys, Acta 815:468-476: Rao et al. (1986) 
Biochem Biophys, Acta 869:197-214]. There are several potential sites for proteolytic 

10 cleavage (Endoprotease Asp-N, Ala^^VAsp"^; V8 protease, Asp^^VAla^^^; Clostripain 
and Trypsin, Arg^^VVal^^*) around the border between the C-terminal transmembrane 
helix and the extracellular domain [Allen et al. (1989) Sequencing of Proteins and 
Peptides (2nd ed.); Drapeau (1978) Can. J. Chem, 56:534-544; (1980) 7. BioL Chem. 
255:839 

15 840]. These represent single cleavage sites, with the exception of trypsin for which 
there are several cleavage sites within the sequence. 

The PcAGP9 cDNA was used to probe northern blots containing RNA from 
six plants representing both dicotyledonous (Pyrus, Nicotiana, Brassica, Arabidopsis, 
and Lycopersicon) and monocotyledonous (Lolium) plants (Figure 5B). At high 

20 stringency (65 ^C), a 0.9 kb transcript was detected in an RNA sample from 

suspension culture cells of Pyrus communis, A smaller transcript was also detected in 
pedicels of the same plant together with a larger transcript in N. plumbaginafolia 
suspension culture cells (Figure 5B-2). Under reduced stringency conditions (55 °C), 
RNA transcripts were also detected in all other RNA samples tested indicating the 

25 expression of AGP genes homologous to PcAGP9 in both dicotyledonous and 
monocotyledonous plants tested (Figure 5B-1). 

The PcAGP9 cDNA has similarity to the N. alata sytle cDNA (AGPNal 1 
clone). In both cases the cDNA clones predict protein sequences composed mainly of 
Pro, Ala, Ser and Thr. Despite the similarity in amino acid composition, these 

30 cDNA clones have little sequence identity. In fact, the AGPNal 1 cDNA and 

PcAGP9 cDNA did not cross hybridize at medium to high stringency on RNA blot 
analysis; the AGPNal 1 detected a single 700-750 nt transcript in most tissues 
examined while die PcAGP9 detected a 800-900 nt mRNA. Other AGP-like peptide 
sequences have also been reported from plumbaginifolia, pear, L. multiflorum and 
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a histidine-rich HRGP from maize suspension cell culture filtrate [Kieliszewski et al 
(1992) Plant Physiol. 99:538-547]. Again, these peptides are composed mainly of 
Hyp, Ala and Ser residues yet the exact sequences is different. For example, the 
Ala-Pro-Ala-Pro repeats present in L. multiflorum are not present in the deduced 
5 amino acid sequence from the AGPNal 1 and PcAGP9 cDNA. 

In another embodiment of the invention, another P. communis cDNA 
(PcAGP2; SEQ ID N0:91) was isolated and shown to be distinct from both the 
PcAGP9 (SEQ ID NO:66) and the PcAGP23 (SEQ ID NO:49) clones. The approach 
to cloning the PcAGP2 cDNA was essentially the same as for the PcAGP9 cDNA 
10 (Example 5). 

The 10k protein purified in FPLC as Peak 2B (Figure 5D-D4) and having the 
amino acid sequence of AEAEAOTOALQWAEAOELVOTOVOTOSY (SEQ ID 
NO: 88) was selected for the isolation of a corresponding pear AGP gene. Two 
reverse and partially complementary long "guessmers" [AcFl (SEQ ID NO:89) and 
15 AcR2 (SEQ ID NO:90). Table 5.3] were synthesized. 

Table 5.3 

Nucleotide and corresponding peptide sequences 
of the "guessmers" AcFl and AcR2 

AcFl (SEQ ID NO: 89) ""^ ~" 

5' TTCCTGCAGAAGCAGAAGCACCAACACCT^GCACTACJ^GT AGTAGCAG 3' 
AcR2 (SEQ ID NO: 90) 

5 ' CTG GAGCTCA TATGATGGTGTTGGTACTGGTGTTGGTACTAG TTCTGGTGCTTCTGCTAC 3 ' 

Note: Restriction enzyme cut sites incorporated into the guessmer for subcloning are 
underlined. Reverse-complementary regions are double-underlined. 

In the "guessmers," nucleotide A was used at the third codon position for all 
amino acids, and CTA and TCA were assigned for Leu and Ser residues, 
respectively. The last 18 bp sequence at the 3' of the two "guessmers" were reverse- 
complementary, and they were annealed to each other in PGR to produce a double- 
5 stranded DNA fragment of 101 bp encoding the amino acid sequence A-E-A-E-A-O- 
T-O-A-L-Q-V-V-A-E-A-O-E-L-V-O-T-O-V-O-T-O-S-Y (SEQ ID NO:88). The PGR 
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fragment was subcloned into the pBIuescriptll (Ks) vector. A ^^P-labeled anti-sense 
RNA probe was synthesized using T3 RNA polymerase from the 101 -bp DNA 
fragment and used to screen a pear cDNA library. Five cDNA clones were isolated 
and sequenced. The consensus sequence of 1040 bp is shown in Figure 5E. This 

5 cDNA is referred to as PcAGP2 (SEQ ID N0:91). 

The PcAGP2 cDNA sequence encodes a polypeptide of 294 residues and can 
be divided into four domains (Figure 5E). The first 20 amino acid sequence is 
hydrophobic and predicted to be a secretion signal with a potential cleavage site 
between Ser^° and Phe^^ The second domain (residues 21-51) is rich in Asn and 

10 contains a stretch of five Asn residues. The third domain (residues 52-135) is rich in 
Pro, Ala, Thr, and Glu. Most of these four residues are located in this domain. This 
domain also includes all the peptide sequences obtained by protein sequencing. The 
fourth domain (residues 136-294) is rich in Asn and Gly and contains two direct 
repeated sequences of 34 residues. The amino acid composition of the deduced 

15 protein, excluding the signal sequence, differs from that obtained from the 

glycosylated and deglycosylated AGP in Peak 2B in that it is rich in Asn (14.2%), 
Glu (8.0%), Gly (10.5%) and Ser (9.1%) (Table 3.5). However, the sequence from 
residues 53 to 88 has an amino acid composition closely matching that obtained from 
the AGP in Peak 2B. 

20 Except as noted hereafter, standard techniques for isolation and purification of 

proteins and protein fragments, sequencing, chromatography, cloning, DNA isolation, 
amplification and purification, for enzymatic reactions involving DNA ligase, DNA 
polymerase, restriction endonuclease and the like, the PGR technique and various 
protein separation and purification techniques are those known and commonly 

25 employed by those skilled in the art. A ntimber of standard techniques are described 
in Deutscher (1990) Methods in Enzymology 182:309-539; Maniatis et al. (1982) 
Molecular Cloning, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York; 
Wu (ed.) (1979) Meth, Enzymol. 68; Sambrook, et al. (1989) supra : Wu et al. (1983) 
Meth. Enzymol. 100 and 101; Grossman and Moldave (eds.) (1980) Meth. Enzymol. 

30 65; Miller (ed.) (1972) Experiments in Molecular Genetics . Cold Spring Harbor 

Laboratory, Cold Spring Harbor, New York; Old and Primrose (1981) Principles of 
Gene Manipulation . University of California Press, Berkeley; Schleif and Wensink 
(1982) Practical Method of Molecular Biology : Glover (ed.) (1985) DNA Cloning 
Vols. I and U, IRL Press, Oxford, UK; Hames and Higgins (eds.) (1985) Nucleic 
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Acid Hybridization , IRL Press, Oxford UK; Setlow, Hollaender (1979) Genetic 
Engineering: Principles and Methods, Vols. 1-4, Plenum Press, New York and 
Deutscher (ed.) (1990) Guide to Protein Purification, Academic Press, New York. 
Abbreviations and nomenclature, where employed, are deemed standard in the field 
5 and commonly used in professional journals such as those cited herein. 

It will be appreciated by those of ordinary skill in the art that the objects of 
this invention can be achieved without the expense of undue experimentation using 
well known variants, modifications, or equivalents of the methods and techniques 
described herein. The skilled artisan will also appreciate that alternative means, other 

10 than those specifically described, are available in the art to attain protein purification 
and to achieve the functional features of the molecules described herein and how to 
employ tiiose alternatives to achieve functional equivalents of the molecules of the 
present invention. It is intended that the present invention include those variants, 
modifications, alternatives, and equivalents which are appreciated by the skilled 

15 artisan and encompassed by the spirit and scope of the present disclosure. 

7>e following examples are provided to better elucidate the practice of the 
present invention and should not be interpreted in any way to limit the scope of the 
present invention. Those skilled in the art will recognize that various modifications 
can be made to the methods and genes described herein while not departing from the 

20 spirit and scope of the present invention. 

EXAMPLE !• General method for the isolation and purification of 
AGP peptides from plant cells comprising AGP 

1. Preparation of cell suspension cultures 

Suspension culmres of plant cells comprising AGP were initiated from 
25 cotyledons of seedlings germinated in the mediimi of Murashige and Skoog (1977) 
FhysioL Plant 15:473-497 supplemented with plant hormones, factors, buffers, salts, 
etc., as are routinely used in the art to enhance and improve the quality of cell 
growth. 

2. Preparation of plant tissue extracts 

30 Plants were grown from commercial seed stock and were maintained under 

standard glass house conditions. 
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3. Isolation of total AGPs 

Total AGPs were prepared from suspension-cultured cells by precipitation of 
AGPs from the culture medium with Yariv reagent [Yariv et al. (1967), Biochem. J, 
105: IC], followed by dissociation of the AGP- Yariv reagent complex and recovery of 
5 the AGP. Alternatively, total AGPs were prepared from plant tissue extracts by 
(NH4)2S04 precipitation, anion exchange chromatography and/or immunoaffmity 
chromatography with, for example, an antibody specific for Gal 1-6-B-Gal sequences, 
followed by gel filtration chromatography using, for example, a superose matrix. 

The AGPs of the total AGP fraction were separated using either ion exchange 
10 or reverse phase HPLC. The individual AGPs were then subjected to amino acid 
sequencing. Alternatively, the total AGP fraction was subjected to deglycosylation ^ 
using, for example, TFMS or HF, and the deglycosylated AGPs were separated either 
on SDS-PAGE or reverse phase HPLC and prepared for amino acid sequencing. In 
some cases, the peptides were digested by treatment with proteolytic enzymes before 
15 separation of the different deglycosylated peptides. 

Hydroxyproline-rich AGP fragments are separated from hydroxypvoline-poor 
fragments by chromatographic methods based on differentiating characteristics, e.g., 
polarity, immunogenicity, etc. For example, affinity chromatography supports to 
which are attached ligands specific for amino acid R-group hydroxyls or antibodies to 
20 a hydroxyproline-rich peptide fragment that is OAST-enriched are used to retain 
preferentially hydroxyproline-rich peptides. Other protein purification techniques 
useful in the separation of hydroxyproline-rich and hydroxyproline-poor fragments are 
found in Deutscher, Guide to Protein Purification (1990) Methods in Enzymology 
182 . 

25 EXAMPLE 2. Cloning of genes encoding a protein backbone of an 

AGP from Nicotiana alata^ and N. plumbaginafolia 

(a) Isolation and purification of AGP peptides from suspension cultures 
of Nicotiana alata 

1. Preparation of suspension cultures 

30 Suspension cultures of N, alata cells were initiated from cotyledons of 

seedlings germinated in the medium of Murashige and Skoog (1977), supra , 

supplemented with Ig/l my(?-inositol, 2g/l Mes/KOH pH 5.7, 4% (w/v) sucrose, 0.1 
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mg/1 gibberellic acid and 5 mg/1 a-napthalene-acetic acid. The cells were 
subcultured weekly in this medium without gibberellic acid. 

2. Purification and deglvcosvlation and sequencing of AGPs 

Cells of A^. alata were removed from the culture medium by filtration through 
5 two layers of Miracloth. The supernatant was centrifiiged (10,000 x g; 50 min) to 
remove any cell debris. To the supernatant, NaCl and fi-glucosyl Yariv reagent Yariv 
et al. 1967) were added to a final concentration of 1% and 0.2%, respectively. The 
AGP- Yariv complex was pelleted by centrifiigation (10,000 x g; 50 min), washed 
twice with 1% NaCl, followed by centrifiigation as above. The pellet was dissolved 

10 in HjO and undissolved material removed by centrifugation (10,000 x g; 20 min). 
The AGP- Yariv complex was re-precipitated by adding NaCl to 1%, and the 
precipitate washed and redissolved in HjO. The Yariv precipitation and NaCl wash 
steps were repeated twice. The AGP- Yariv precipitate was finally dissolved in HjO 
and sodium dithionite (30%) was added to disrupt the AGP- Yariv complex. The 

15 volume of die sample was reduced by Diaflo (YM30 membrane; Mr 30,000 Dalton 
cut off) filtration and the solution desalted by passage through a PDIO column 
(Pharmacia) equilibrated with 10 mM NH4HCO3. 

AGPs from N. alata were deglycosylated by trifluoromethane sulphonic acid 
(TFMS) using a modification of the procedure of Edge et al. (1981). The 

20 deglycosylated AGPs were separated on 17.5% SDS-PAGE according to Laemmli 

(1970). The 17.5% SDS-PAGE gels were run at 200V with thioglycoUic acid (1 mM) 
in the upper reservoir until the tracking dye reached the bottom of the gel. The 
peptides were transblotted onto a PVDF membrane with blotting buffer [10 mM 3- 
(cyclohexylamino)-l-propanesulfonic acid (CAPS) buffer pH 11, 15% methanol, 

25 thioglycoUic acid (70 ^l/l)]. Blotting was for 1.5 h at 90V with cooling. The blot 
was stained with 0.1% Coomassie Blue in 50% methanol, 10% acetic acid for 5 min 
and de-stained in 50% methanol, 10% acetic acid for 5 min. The blot was washed 
with distilled water overnight and bands excised and sequenced. A major band 
having a molecular weight of approximately 20-30 kD was obtained from the 

30 deglycosylated N, alata AGPs. 

3. Sequencing 

Purified protein was chromatographed on a reverse phase HPLC microbore 
colunm prior to automated Edman degradation on a gas phase sequencer [Mau et al. 
(1986), Plama 169:184-191]. Phenylthiohydantoin amino acids were analyzed by 
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HPLC, as described by Grego et al. (1985), Eur, J, Biochem, 264:857-862. An N- 
terminal amino acid sequence, A-K-S-K-F-M-M-P-A-S-X-T-X-A (SEQ ID NO: 11) 
was obtained. 

(b) Cloning of genes from N. alata and N. plumbaginafolia cell cultures 

5 1. In vitro amplification of 5' end of the cDNA 

Total RNA (10 /xg) from N. alata suspension cultured cells was mixed with 
1.0 pmoles gene specific radioactive primers in 10 /xl of 40 mM PIPES (pH 6.0), 1 
mM EDTA and 0.4 M NaCl. The mixmre was heated at 80° for 5 min and 
incubated at 37*^ overnight. The RN A/primer mixture was precipitated by ethanol 

10 and resuspended in 20 /il of reverse transcription buffer containing: 50 mM Tris-HCl 
(pH 8.3), 60 mM KCl, 10 mM MgCl2, 1 mM DTT, 20 U RNase inhibitor and 50 U 
AMV reverse transcriptase. After 1 h incubation, the reaction was stopped by 
addition of EDTA. The RNA was removed by treatment with RNase and the primer 
extension product was purified by poly aery lamide gel electrophoresis. 

15 The primer extension product was tailed with dGTP by terminal transferase 

and amplified by PCR using a (dC)i5-adaptor primer and the gene specific primers. 
The PCR was carried out in 100 ^1 solution containing: Ix PCR buffer (100 mM 
Tris-HCl pH 8.3, 500 mM KCl), 2 mM MgCl^, 200 /xM dNTPs, 100 ng poly dC 
primer, 100-200 ng of gene-specific primer and 2,5 U of Taq DNA polymerase. 

20 Samples were denatured by boiling for 5 min and then cooled to 80** before Taq 

DNA polymerase was added. The PCR cycles are: 25X: 93°, 30 sec; 42**, 30 sec; 
72°C, 2 min; 4X: 93°, 30 sec; 42°, 30 sec; 72°, 5 min and IX: 93°, 30 sec; 42°, 
30 sec; 72°, 10 min. The PCR product was subcloned and sequenced. 
2. In vitro amplification of 3*-end of the cDNA 

25 cDNA was synthesized in a volume of 20 /xl solution containing 10 ^g total 

RNA, Ix PCR buffer, 50 mM MgClj, 10 mM dNTPs, 5 /xM of dT(n) + adaptors, 
30 U of RNasin and 50 U AMV reverse transcriptase at 42° for 1 h. cDNA (2 ^^l) 
was subjected to PCR reaction described as above, except that the annealing 
temperature was 60° in this case. 

30 3. Screening of cDNA libraries with the PCR fragment 

About 5 X 10* pfu phage/plate of cDNA libraries (in Xzap) were plated out. 
After overnight growth at 37°, phage were blotted onto nitrocellulose membranes and 
hybridized with ^^P-labeled DNA fragment at 68° overnight in a hybridization buffer 
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containing 2x SSPE, 1% SDS, 0.5% BLOTTO, 1% PEG and 0.5mg/ml carrier DNA 
[Sambrook et ai. (1989) supral . The membranes were washed at 68 for 30 min in 
Ix SSC -f 0.1% SDS and exposed to X-ray film. Positive Xzap clones were 
converted into plasmid DNA by in vivo excision as described in the Stratagene's 
5 instruction manual for the sequence analysis. 

4. Purification and N-terminal sequencing of AGPs from the cell suspension culture 

The purified AGPs were deglycosylated with TFMS and the resulting peptides 
separated on a 17,5% SDS-PAGE gel and blotted onto a PVDF membrane. The 
major band (MW: 20-30 kD) (Figure IC) was excised and sequenced. An N-terminal 
10 peptide sequence: A-K-S-K-F-M-M-P-A-S-X-T-X-A (SEQ ID NO: 11), was obtained. 

5. In vitro amplification of an AGP gene from N, alata cDNA by PGR 

The strategy to clone the gene corresponding to the peptide sequence is 
illustrated in Figure ID. Two groups of degenerate reverse primers of 17 bp 
corresponding to part of the AGP amino acid sequence were synthesized (Table 1.1). 

15 When the group 1 primers were used in a primer extension experiment (Figure lD-1), 
a single 160-bp cDNA fragment was obtained (Figure IE). The primers of group 1 
were further divided into six subgroups each containing three 17-mers (Table 1.1). 
Primer extension experiments showed that group NaRl gave the highest yield of the 
160-bp fragment and these oligonucleotides were therefore used as the gene-specific 

20 primer in subsequent scale-up preparation of primer extension product and PGR 
experiments. The 160-bp primer extension product was purified and tailed with 
dGTP. The tailed, single-stranded cDNA was then amplified by PGR with the oligo 
NaRl and a (dC)i5-adaptor as primers (Figure lD-1). The PGR fragment was 
subcloned and sequenced (SEQ ID NO:21; Figure IE), The sequence included a 

25 derived peptide which matched with the sequence obtained from the isolated AGP 
peptide (SEQ ID NO: 11). There was one mismatch, the Ala obtained from the 
peptide sequencing was replaced with an Arg in the cDNA derived sequence. On the 
basis of this close match (8/9 amino acids), the 160-bp fragment was concluded to 
represent a correct sequence for part of the gene. 

30 Two specific primers with sequences: 5'GATTATGGGTCATTTGACTAAGG3' 
(SEQ ID NO:22) (NaFl); S'GGTGATCTCAACTCCATTGGTGGB' (SEQ ID NO:23) 
(NaF2), corresponding to positions 56-78 and 101-123 (Figure IE) were then 
designed and used in conjunction with the two 3 '-end nonspecific primers (Adl and 
Ad2) to amplify the 3'-part of the AGP gene by nested PGR (Figure lD-2). A 1.6-kb 
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fragment was amplified and sequenced. The alignment of the sequences obtained 
from the two PGR reactions gave rise to a DNA sequence of 1679 bp (Figure IF). 
The PGR fragment encodes a protein containing the peptide obtained by protein 
sequencing with two mismatches: Arg for Ala at position 1 and Pro for His at 
5 position 12 (Figure IF), 

6. Isolation and sequence analysis of cDNA clones from N. alata and N, 
plumbasinafolia cDNA libraries 

The 1.6-kb PGR fragment was used to screen a cDNA library made from RNA 

isolated from N. alata cells in suspension culture and three positive clones were 

10 isolated and sequenced. The alignment of the PGR sequence with the cDNA 

sequences gave rise to a 1700-bp sequence including a poly (A) tail of 7 bp (Figure 
IF), This sequence is designated NaAGPl (SEQ ID NO:24). Further primer 
extension experiments suggested that the 1.7kb NaAGPl cDNA represents the full- 
length sequence of the AGP transcript. 

15 The NaAGPl cDNA encodes an open readmg frame, which starts with an 

initiation codon (ATG) at position 60 and ends with a termination codon (TAA) at 
position 1443 (Figure IF). 

7. Northern and Southern blot analyses of the putative AGP gene 

The NaAGPl was cut into a 5' half (1-540 bp) corresponding to the 
20 nontranslated part, the transmembrane helix and the proline-rich domain and a 3 '-half 
(541-1700 bp) including the asparagine-rich domain, G-terminus and the 3'- 
nontranslated part. These two parts of the cDNA were used separately to probe 
northern blots of RNA [Sambrook et al. (1989) supral isolated from suspension- 
cultured cells of N. alata and TV. plumbaginafolia and various tissues of N. alata 
25 plants. 

(c) Isolation and purification of AGP peptides from suspension cultures 
of Nicotiana plumbaanafoUa 

1. Isolation of total native AGPs from N, Dlumbasinafolia Biopolvmer 

The total native AGPs were purified from the Biopolymer product by 
30 precipitation with the Yariv reagent after depleting the starting material of pectins by 
GTAB (hexadecyl trimethyl ammonium bromide) precipitation prior to Yariv 
precipitation. The medium from the cell suspension culmre was separated from the 
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cells by filtration and the high molecular materials precipitated with four volumes of 
ethanol. This is referred to as the Biopolymer product. 

Biopolymer product (1 g) was dissolved in 1% NaCl solution (100 ml) and 
filtered through two layers of Miracloth. The filtrate was centrifuged (10,000 xg, 10 
5 min) and the supernatant collected. An equal volume of CTAB solution (2% CTAB 
m 20 mM Na2S04) was added. After 1 h incubation at 37*", the solution was filtered 
through two layers of Miracloth and then centrifuged (10,(XK) xg, 20 min) to remove 
any remaining precipitate. Four volumes of ethanol were then added to the 
supernatant and centrifuged at 10,000 xg for 20 min. The pellet was dissolved in 100 

10 ml of 1% NaCl solution and AGPs precipitated with Yariv reagent as described in 
Example 2(a) 2. The desalted AGP sample was re-dissolved in 6 M guanidinium-HCl 
and incubated at 50** for 15 min. The sample was then chromatographed on a FPLC 
Superdex™75 colunm equilibrated with 6 M urea and 20 mM Tris-HCl, pH 8.8. The 
void (Vo) fraction was collected, dialysed against distilled water and freeze dried. 

15 This sample is the total native AGPs. The total native AGPs were treated by one of 
two paths: 

Path 1 : Deglycosylation followed by reverse phase HPLC fractionation before 
direct sequencing, or sequencing after enzymatic (proteolytic) digestion. 

Path 2: Reverse phase HPLC fractionation followed by deglycosylation and 
20 further separation by reverse phase HPLC fractionation. 
Path 1 fcomprising steps (2)-(5)] : 

(2) Deglycosylation of total native AGPs using anhydrous HF 

The AGP sample was dried in a vacuimi oven at 40° in the presence of 
P2O5 overnight; 0.2 ml anhydrous MeOH and 1 ml of anhydrous HF [Mort and 
25 Lamport (1977) Anal. Biochem. 82:289-309] was added and mixed well to dissolve all 
the sample. This mixture was incubated at room temperature, under argon, for 3 h 
and the HF removed by vacuimi aspiration. Ice cold TFA (0,5 ml) was added and 
the sample desalted on a PDIO column equilibrated with 0.1% TFA, and freeze 
dried. This sample is referred to as the total deglycosylated AGPs. 
30 (3) Reduction and carboxvmethvlation of the total deglvcosvlated AGP sample 
The total deglycosylated AGP sample was dissolved in 6 M guanidinium- 
HCl (in 0.2 M Tris-HCl, pH 8.5 and 20 mM DTT; 600 /xl); and incubated at 25^ 
under argon for 2 h. Freshly prepared iodoacetic acid (100 was added. The 
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mixture was incubated for 3 h at 25'' and stopped by addition of DTT to 100 mM and 
following dilution was chromatographed as above. 

(4) HPLC separation of the total deglvcosvlated AGPs 

After reduction and carboxymethylation, the total deglycosylated AGPs were 
5 separated on a RP-300 HPLC column with a linear gradient (60ml) (0-100% solvent 
B; flow rate 1 ml/min) (solvent A: 0.1% TFA in water, solvent B: 60% acetonitrile 
in solvent A). The profile is shown in Figure 2 A. Two major peaks RT21 and RT32 
(retention times 21 min and 32 min, respectively) were collected for further analysis. 
Amino acid analysis was performed on both peaks (see Table 2.1). The RT32 peak 
10 was sequenced without further treatment. The RT21 peak was subjected to 
thermolysin digestion before sequencing. 

(5) Thermolysin digestion of RT21 

RT21 sample (12 ^g) was concentrated and Tween 20 added to give a final 
volume of 100 fil with a final concentration of 0.01% Tween 20. NH4HCO3 (1% in 

15 0.01% Tween 20; 500 /xl), CaClj (0.1 M; 7 ^1) and thermolysin (1 mg/ml; 7 ^l) were 
added and the mixture incubated at 55"* for 3 h. The products were purified on 
reverse phase HPLC and sequenced. The peptide sequences obtained are shown in 
Figure 2 A and were used to construct primers for cloning. The sequences L-A-S-O- 
O-A-O-O-T-A (SEQ ID NO:26), L-A-S-O-O-A-O-O-T-A-D-T-O-A (SEQ ID NO:27), 

20 F-A-O-S/N-G-G-V-A-L-P-O-S (SEQ ID NO:28), and I-G-A-A-O-A-G-S-O-T-S-S-P-N 
(SEQ ID NO: 29) from RT21 are either similar to or identical with that obtained from 
fraction RT25 of alata styles (Figure 4C) and represent conserved, tissue- 
nonspecific N, alata AGP fragments. 

Peak 32 gave the sequence R-K-S-K-F-M-I-I-P-A-S-O-T-O-A-O-T-O-I-N-E-I- 

25 S-F (SEQ ID NO:30) which at the 5 '-end, matched very closely the N-terminal 
sequence (SEQ ID N0:1) obtained firom M alata suspension culture. 
Path 2 rcomprising steps (6>-f8)1 : 

(6) HPLC fractionation of total native AGPs 

The total native AGPs sample was dissolved in 6 M guanidinimn-HCl and left 
30 at 50° for 15 min. The sample was then firactionated on reverse phase HPLC (RP- 
300; 4.6 mm x 10 cm colimm) with a linear gradient (60 ml) (0-100% solvent B; 
flow rate 1 ml/min) (solvent A: 0.1% TFA in water, solvent B: 60% acetonitrile in 
solvent A). A number of major peaks were obtained from this separation all of which 
reacted with Yariv reagent in a gel diffusion test (van Hoist and Clarke, 1985) 
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(unbound, RT5, RT6, RTIO, RT21-23 and RT34) (Figure 2.2). Each fraction was 
quantified for AGP content (Table 2.1) as described by van Hoist and Clarke (1985). 
Amino acid analyses of each fraction of native AGPs are shown in the Table 2.1. 

(7) Deglvcosvlation of native AGP fractions from HPLC 

5 Individual native AGP fractions from reverse phase HPLC (Figure 2B) were 

deglycosylated using anhydrous HF as described above. 

(8) HPLC separation of the deglvcosvlated AGPs 

After deglycosylation, each sample was reduced and carboxymethylated before 
reverse phase HPLC separation (Figures 2C and 2D). The fractions obtained were 
10 reserved for further sequencing. 



EXAMPLE 3. Cloning of a gene encoding a protein backbone of an 
AGP from P. communis suspension cultured cells 
(PcAGP23 clone) 

(a) Isolation and purification of AGP peptides from cell cultures of Pvrus 
15 communis (pear) 

1. Isolation of total native AGPs from Pvrus communis (pear) Biopolvmer 

The total native AGPs were purified by Yariv precipitation from pear 

Biopolymer as described for AGPs of Nicotiana plumbaginafolia in Example 2(c) 1. 

The AGPs were deglycosylated and resulting peptides separated by reverse phase 

20 HPLC (RP-300) (Path 1). Alternatively, the total native AGPs were fractionated by 

reverse phase HPLC (RP-300), deglycosylated, digested with thermolysin and 

peptides purified for sequencing. 

Path 1 rcomprising steps (2) and (3)1 : 

(2) HPLC separation of total deglvcosvlated AGPs for sequencing 

25 The total native AGPs were deglycosylated usmg HF. The sample was 

reduced and carboxymethylated before separation on reverse phase HPLC (RP-300) as 
described in Example 2(c)(2). The profile is shown in Figure 3 A. The results of 
amino acid analysis of major peaks are simmiarized in Table 3.1. 

(3) Separation of thermolvsin digested peaks on a C18 microbore column 

30 Deglycosylated AGP fi-actions (unbound, RT16.4 and RT18.2 from Figure 3A) 

were subjected to thermolysin digestion. The products were separated on an RP-300 
column (2.1 nun x 10 cm); linear gradient (6 ml) (0-100% B; flow rate at 0.1 
ml/min) (solvent A: 0.1% TFA in water, solvent B: 60% acetonitrile in solvent A). 
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The unbound fraction after digestion remained unbound, i.e., gave no peptide which 
bound to the RP-300 column. The RP-300 profile for digested RT16.4 is shown in 
Figure 3B and for RT18.2 is shown in Figure 3C. 

Individual peaks (peaks 1-5, Figure 3B) from thermolysin digested RT16.4 
5 (Figure 3 A) were separated on a C 18 microbore column (2.1 mm x 10cm) and 

resolved on a linear gradient (50 ml, 0-50% B; flow rate 0.1 ml/min) (solvent A: 1% 
NaCl, solvent B: 100% acetonitrile). Peaks were further separated on the same 
column with TFA-acetonitrile system (solvent A: 0.1% TFA, solvent B:60% 
methanol in solvent A; 0-100% B in 60 min at 0.1 ml/min). Neither solvent system 

10 gave further separation of peaks. Three of the peaks (peaks 1, 3 and 5) were 
subjected to amino acid sequencing. Peak 1 was a pure peptide and gave clear 
sequence L-S-0-K-K-S-0-T-A-0-S-0-S-(S)-T^O-0-T-(T) (SEQ ID NO:31). Peaks 3 
and 5 were not single peptides and at least two stretches of sequence were obtained 
from each of these two peaks with less certainty. Peak 3 gave the sequence: V/A- 

15 A/T-A-O-S/O-O/Y-S-S-T/A-X-O-S-A-T-X-T-X-X-V-A (SEQ ID NO:32); whereas 
Peak 5 gave the sequence: V/A-A-D/A/O-S/O-T/O/K-O-S/O-P-Q-S (SEQ ID 
NO:33). 

Individual peaks (peaks 1-5, Figure 3C) from thermolysin digested RT18.2 
were separated as described above for RT16.4. A number of peptides were obtained 
20 and sequenced: 



(i) L-G-I-S-O-A-O-S-O-A-G-E-V-D-(G) (SEQ ID NO:34) 

(ii) X-X-O-O-A-A-O-V-X-A-O/S (SEQ ID NO:35) 

(iii) V-T-A-0-T-0-S-A-S-0-0-S-S-T-(T)-A-A-T-(T)-A (SEQ ID NO:36) 

(iv) A-K-S-O-T-A-T-O-O-T-A-T-O-O-S-A-V (SEQ ID NO:37) 
25 (v) V-T-A-0-T-0-S-A-S-0-0-S-S-T-0-A-(S)-T-X-A (SEQ ID NO:38) 

(vi) L-S-0-K-K-S-0-T-A-0-S-0-S-(S)-T-0-0-T-(T) (SEQ ID NO:31) 

The last sequence is identical to the sequence obtained from Peak 1 of RT16.4. 
Path 2 [comprising steps (4)-(7)1 

(4) Fractionation of total native AGPs bv reverse phase HPLC 
30 The total native AGPs samples were separated by reverse phase HPLC 

essentially as described in Example 2(c)2-4. A number of major peaks were obtained 
from this separation all of which reacted with Yariv reagent in a gel difftisica test 
(van Hoist and Clarke, 1985) (unbound, RT7,8, RT17.2 and RT19.1) (Figure 3D). 
Amino acid analyses of unbound and RT7.8 fractions are shown in the Table 3.1. 
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(5) Deglvcosvlation of native AGP fractions from HPLC 

Individual native AGP fractions from reverse phase HPLC were deglycosyiaied 
using anhydrous HF as in Example 2(c)(7). 

(6) HPLC separation of the deglvcosvlated AGPs 

5 After deglycosylation, each sample was reduced and carboxymethylated before 

separation on reverse phase HPLC (RP-300) as described previously. The profiles of 
each sample are shown in Figure 3E and Figure 3F. The major peaks RT16-19 in 
Figure 3F have similar retention times with the group of peaks RT16-19.9 in Figure 
3A. These peaks may arise from the one component or a group of closely related 
10 components. 

(7) Thermolvsin digest of deglvcosvlated pear AGPs 

Peak RT23 from Figure 3E was digested with thermolysin and the resulting 
peptides were further purified on reverse phase HPLC (RP-300). Six peptides were 
selected for sequencing and gave the following amino acid sequences (also shown in 
15 Figure 3): 

(i) I-S-O-A-S-T/Q-O-O-T-T-S-O-A-S-O-O-T (SEQ ID NO:39) 

(ii) V-S-P/S-0-V-Q-S-0-A-S-0-0-0-T-(T) (SEQ ID NO:40) 

(iii) L-V-V-V-V-M-T-P-R-K-H (SEQ ID NO:41) 

(iv) X-N-O-A-T-O-O-A-T/K-P (SEQ ID NO:42) 
20 (V) I-A-A-T-O-S-(L) (SEQ ID NO:43) 

(vi) (G)/(S)-N-A-0-A-0-X-0-K-P (SEQ ID NO:44) 

(b) Cloning of genes from P. communis cell suspension culture 

To obtain an AGP gene from P. communis the methods and procedures 
essentially as described for the cloning of genes from alata and N, 
25 plumbaginafolia were followed. 

A nxmiber of primers corresponding to the L-V-V-V-V-M-T-P-R-K-H (SEQ 
ID NO:41) sequence (Figures 3D and 3E) were designed and synthesized for PGR 
experiments (Table 3.2). The same nested PGR procedure used for the cloning of the 
NaAGPl gene (Figure lB-2) was used to clone the gene encoding the above peptide, 
30 except that the annealing temperature was 52°C in this case. A 350-bp fragment was 
amplified after two successive PGR reactions using the PcA23Fl as the first primer 
and the PcA23F2a as the second primer. The fragment was sequenced and found to 
encode the correct peptide sequence (SEQ ID NO:48; Figure 3G). 

The PGR fragment was used to screen a cDNA library made from mRNA 
35 from pear cell suspension culture, as described above for N. alata cell suspension. 
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One positive clone (PcAGP23) was isolated and sequenced (SEQ ID NO:49; Figure 
3H). This clone contained an insert of 760 bp and matched the PGR sequence. 

EXAMPLE 4. Cloning of genes encoding a protein backbone of an 

AGP from Nicotiana alata style 

5 (a) Isolation and purification of AGP peptides from the styles of 
Nicotiana alata 

Total native AGPs of N, alata styles were purified by ion exchange 
chromatography (lEC) and gel filtration chromatography (GFC). The AGPs were 
then deglycosylated by HF and fractionated by reverse phase HPLC. Peptide 
10 sequence data were obtained after thermolysin digestion of these deglycosylated 
fractions. 

1. Purification of total native AGPs 

Styles (500-1000 styles including the stigma) were collected fresh or were 
stored at -70'*C. The styles were ground with polyvinyl pyrrolidone (1% w/v) in the 

15 presence of liquid nitrogen, and extraction buffer (50-100 ml; 100 mM Tris pH8, 1 
mM EDTA, 14 mM B-mercaptoethanol) was added. The mixture was centrifiiged 
(10,000 xg, 20 min) and cell debris discarded. The extract was brought to 95% 
ammonium sulfate at 4*", centrifuged (10,000 xg, 20 min) and the supernatant 
collected and concentrated by ultrafiltration using a Diaflo system (YM-30 membrane, 

20 Mr 30 kD cut off) to about 10-20 ml. The solution was desalted on a PD-10 colunm 
(Pharmacia) equilibrated with 10 mM Tris pH8. The sample was applied to a FPLC 
Mono Q column (Pharmacia; buffer A: 10 mM Tris pH8; buffer B: 10 mM Tris 
pH8, 1 M NaCl; gradient: 0-30%B 15 min, 30-100%B 0.1 min). The bound AGP 
fractions were detected by the Yariv reagent gel diffusion test on samples of each 

25 fraction; AGP containing fractions eluted at about 5-15% buffer B (Figure 4A). The 
AGP fractions were pooled, equilibrated into 10 mM NH4HCO3 with a PD-10 
desalting column, freeze dried, and further purified on a Superose 6 fi colimm 
(Pharmacia) in 6M urea, 10 mM Tris pH8 (Figure 4B). The AGP containing 
fractions were exchanged as above into 10 mM NH4HCO3 and freeze dried. 

30 Recovery of style AGP during the purification procedure is as follows: crude 

style extract (1000 styles), 100%; 95% (NH4)2S04-supematant, 68.2%; Mono Q 
anion-exchange colunm. Unbound AGPs 5.4%, Bound AGPs 44.5%; Superose 6 gel 
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filtration colximn, 25.4%. The presence of AGPs at different stages of purification is 
demonstrated on SDS-PAGE gels in Figure 4N. Crossed-electrophoresis of AGPs 
from styles of N. alata during fractionation is presented in Figure 40. 
2. Deglvcosvlation of total native AGPs and sequencing of peptides 
5 Deglycosylation, peptide cleavage and sequencing were performed as 

described in Example 2(c)2. Two major peaks, RT25 and RT35 (Figure 4C), were 
obtained after deglycosylation as well as an unbound fraction. Amino acid analysis of 
each fraction and the native materials are shown in Table 4,1. Each fraction was 
digested with thermolysin. No peptide which bound to the RP-300 column (2.1 x 100 

10 mm) was obtained from the unbound fraction. Three of the sequences from RT25, F- 
A-O-S-G-G-V-A-L-P-O-S (SEQ ID NO:50), L-A-S-O-O-A-O-O-T-A-D-T-O-A (SEQ 
ID NO:51), and I-G-S-A-O-A-G-S-O-T-S-S-P-N (SEQ ID NO:53) match closely that 
obtained for RT21 from N, plumbaginafolia (SEQ ID NOS:27-29, respectively; 
Figure 2A). A fourth fragment gave the sequence I/V-G/S-A/S-A/O-O/S-A/Q-G/S- 

15 S/O-O/S-T/A-S/A-S/A-P-O (SEQ ID NO:52). 

Since no N-terminal sequence was obtained for the RT25 protein backbone, 
pyroglutamate aminopeptidase was used to remove the N-terminal blocked 
pyroglutamate residue [20 /xg pyroglutamate aminopeptidase (Boehringer Mannheim) 
in 100 mM potassium phosphate buffer pH 8.0, 10 mM EDTA, 5 raM DTT, 5% 

20 glycerol at 37 °C overnight; deblocked protein was separated by RP-HPLC and N- 
terminal amino acid sequencing was performed] and the sequence Ala-Hyp-Gly was 
obtained. The RT25 backbone was also fragmented by treatment with thermolysin 
[thermolysin (Boehringer Mannheim) at 0.2 /ig/^g protein was added to RT25 protein 
backbone (2-10 /ig) and incubated at 55 "^C for 2 hours in 500^1 of 1% ammonium 

25 bicarbonate, pH 7.8, 1 mM CaCIj and 0.01% Tween 20]. The resulting peptides 
were separated by RP-HPLC. Six major peptides were obtained (Figure 41). Peak 2 
gave the amino acid sequence VSAOSQSOSTAA (SEQ ID NO:67), as well as 
IGSAOAGSOTSSPN (SEQ ID NO:53) and IGSAOAGSO (contained in SEQ ID 
NO:53). Peak 3 gave the sequence LASOOAOOTADTOA (SEQ ID NO:51) and 

30 peak 5 gave the sequence FAOSGGVALPOS (SEQ ID NO:50). Both sequences were 
rich in Hyp, Ser and Ala (33 of 52 amino acid residues). 

Endoproteinase Asp-N (Sigma; 0.1 ^gZ/ig protein) was also used to cleave the 
RT25 protein backbone at the Asp residue [30°C overnight in 500 ^\of 1% 
anmionium bicarbonate, pH 7.8, and 0.01% Tween 20], followed by separation with 
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RP-HPLC. Two major peptides were produced (peaks Al, A2; Figure 4J), indicating 
that there is only one Asp residue in the RT25 protein. The cleavage was incomplete 
as indicated by the presence of the starting material. The first peptide eluted (peak 
Al) gave no sequence data indicating a blocked N-terminal residue. The A2 peak 
5 gave the sequence DTOAFAOSGGVAL (SEQ ID NO:68). The peptide sequence of 
A2 (Figure 4J) overlaps with that of peak 3 (SEQ ID NO:51) (Figure 41) and yields a 
continuous amino acid sequence of 26 residues 

LASOOAOOTADTOAFAOSGGVALPOS (SEQ ID NO:69). Four sequences were 
obtained from the RT35 peak of N. alata style: 

10 (i) X-X-X-Q-S-A-0-A-A-(D)-X-N (SEQ ID NO:54) 

(ii) X-T-F-S/A.Y/L.D/I-I-K/E-T/A-A-I-N-T-E-F-G-P-(E) (SEQ ID NO:55) 

(iii) X-T-F-S/A-Y/L/V-D/I/A-I-E-T-A-I-N-T-E-F-G-P-X-E-X-X-Q (SEQ ID NO:56) 

(iv) X-T-F-S-Y-D/I-K/E-T-A-I-N-T-E-F-G/M-P-A-E (SEQ ID NO: 57) 

Three of these sequences were characterized by the sequence T-A-I-N-T-E-F-G-P 
15 (SEQ ID NO:58). 

3. Purification of style AGPs bv J539 affinity chromatographv 

AGPs were prepared from styles according to Bacic et al. (1988), Phytochem. 
27:679-684. The sample was deglycosylated with TFMS, separated and blotted onto 
a PVDF membrane as described in Example 1(b). A 30 kD band, running at the 

20 same position as the major band prepared by Yariv precipitation from N, alata 

suspension cultured cells Example 1(b) was sequenced. The sequence A-V-F-K-N-K- 
X-X-L-T-X-X-P-X-I-I (SEQ ID NO:59) was obtained, 
(b) Cloning of genes from N. alata styles 
1. In vitro amplification of 3 '-end of the cDNA 

25 cDNA was synthesized in a volume of 20 ^tl solution containing 5 fig total 

style RNA from A^. alata, Ix PGR buffer (10 mM Tris-HCl pH 8.3, 50 mM KCl), 5 
mM MgClj, 1 mM dNTPs; 5 fiM of dT^T^ + adaptors, 30 U of RNasin and 50 U 
AMV reverse transcriptase at 42 for 1 h cDNA (2 /xl) was subjected to polymerase 
chain reaction. The PGR was carried out in 100 /xl solution containing: Ix PGR 

30 buffer, 1.5 mM MgGlj, 200 fiM dNTPs, 30 pmole of the gene-specific primer 
(Figure 1.2), 30 pmole of adaptor primer and 2.5 U of Taq DNA polymerase. 
Samples were denatured by heating at 94** for 2 min and then cooled to 80** before 
Taq DNA polymerase was added. The PGR cycles are: 35 X: 94"', 30 sec; 52^, 30 
sec; 72°, 1 min 30 sec. The PGR product was subcloned and sequenced. 
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2. Screening of cDNA library with the PCR fragment 

About 5 X 10^ pfu phage/plate of cDNA libraries (in Xzap) were plated out. 
After overnight growth at 37"*, phage were blotted onto nitrocellulose membranes and 
hybridized with ^^P-labeled PCR fragment at 65° overnight in a hybridization buffer 
5 containing 0.22 M NaCl, 15 mM NaH2P04, 1.5 mM EDTA, 1% SDS, 1% BLOTTO 
and 4 mg/ml carrier DNA [Sambrook et al. (1989) supral . The membranes were 
washed at 65° for 2 x 15 min in 0.2x SSC and 1% SDS and exposed to X-ray fikns. 
Positive Xzap clones were converted into plasmid DNA by in vivo excision as 
described in the Stratagene*s instruction manual for the sequence analysis. 
10 3. Design of a gene-specific primer based on the AINTEFG sequence 

As described in Example 2(c)2, pp. 37-38, the purified AGPs were 
deglycosylated with HF and the resulting AGP backbones were separated on reverse 
phase HPLC. Two major peaks: RT25 and RT35 were obtained after 
deglycosylation as well as an unbound fraction. Amino acid sequences were obtained 
15 from both peaks after protease digestion. Three of the four peptide sequences from 
peak RT35 contain the sequence: TAINTEFGP (SEQ ID NO: 58). A degenerate 
oligonucleotide was synthesized based on the sequence: AINTEFG (SEQ ID NO:60). 
RT35-specific primers synthesized had the sequence: 

5 ' GCIATTAATACTCAATTTGG3 ' (SEQ ID NO: 61) 

20 C C C G C 

A A 
G 

where I is an inosine residue. 

(a) In vitro amplification of an AGP gene from N, alata cDNA bv PCR 

25 The strategy to clone the gene encoding the RT35 peptide sequence is 

illustrated in Figure 4D. The RT35-specific primer was used in conjunction with the 
adaptor primer in a polymerase chain reaction and a single 380-bp DNA fragment 
was obtained. The PCR fragment was subcloned and sequenced (SEQ ID NO: 62; 
Figure 4E). The sequence included a derived peptide that matched with the sequence 

30 obtained from the isolated AGP peptide. 

(b) Isolation and sequence analysis of a cDNA clone from N, alata 
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The 380-bp PGR fragment (SEQ ID NO:62; Figure 4E) was used to screen a 
cDNA library made from RNA isolated from N. alata styles and one positive clone 
was isolated and sequenced (SEQ ID NO: 63; Figure 4F). 
(c) Northern blot analyses of the Na35_l gene 

5 The Na35_l PGR fragment was used to probe northem blots of RNA 

[Sambrook et al. (1989) supral isolated from various parts of N. alata plants (Figure 
4G), L, peruvianum (tomato) style and suspension-cultured cells of N. alata, N, 
plumbaginafolia and pear (Figure 4H). The Na35_l probe hybridized to a style 
transcript of 800 nucleotide which corresponds to the length of the Na35_l cDNA. 

10 Longer exposure of the northem blot did not reveal any signal in other parts of the 
plant (i.e., leaf, stem, root). The signal strength varies in different genotypes of N, 
alata. The strongest signal was detected in RNA from S^Sg style. The same probe 
did not detect any transcript from tomato style or suspension-cultured cells (Figure 
4H). 

15 4. Design of a gene-specific primer based on the TADTOAF sequence 

(a) Oligonucleotide design and synthesis 

A gene-specific primer of 20 nucleotides long was designed according to the 
overlapping peptide sequences of SEQ ID NOS:50, 51, 53, 67 and 68. Inosine was 
used to reduce the degeneracy as shown: 

20 TADTOAF (SEQ ID NO: 70) 

5' ACI GCI GAT ACT CCT OCT TT 3' (SEQ ID NO: 71) 
C A A A 
C C C 
G G G 

25 The oligonucleotide was synthesized on an Applied Biosystems DNA synthesizer 
(model 391, ABI). 

(b) Rapid amplification of 3' End of the cDNA (3^ RACE) 

Total RNA was isolated from alata styles as described by McClure et al. 
(1990) Nature 342:955-957. Complementary DNA (cDNA) was synthesized from 
30 total style RNA (5 in a 20 /xl solution containing 10 mM Tris-HCl, pH 8.3, 50 
mM KCl, 5 mM MgCl2, 1 mM dNTPs, 5 /xM dT^^^ + adaptors, 30 U RNasin, and 
50 U AMV reverse transcriptase (Promega) at 42 ^'C for 1 hour. cDNA (2 /Ltl) was 
subjected to polymerase chain reaction (PCR) in 100 ^1 solution containing: 10 mM 
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Tris-HCl, pH 8.3, 50 mM KCl. 1.5 mM MgClj, 200 /xM dNTPs, 30 pmole of the 
gene-specific primer, 30 pinole of the adaptor primer and 2.5 U of Taq DNA 
polymerase (Perkin Ehner-Cetus). Samples were denatured by heating at 96^*0 for 2 
min and then cooled to SO^'C before Taq DNA polymerase was added. The PGR 
5 cycles were: 35 x: 96**C, 45 sec; 55**C, 45 sec; 72''C, 1 min. The PGR product 
(400 bp) was cloned and sequenced on an Applied Biosystems DNA sequencer (model 
373A, ABI), The deduced amino acid sequence from this PGR clone matched 
isolated AGP sequences, i.e., SEQ ID NOS:50, 51, 53, 67, 68. 

(c) cDNA library screening 

10 A style cDNA library (XZAP II; Stratagene) was constructed using mRNA 

from styles (6 hours after touching) of N, alata (SgSg) by Dr. Joaqum Royo, Plant 
Cell Biology Research Genter, School of Botany, The University of Melbourne, 
Parkville, Australia (PGBRG). cDNA library (300,000 pfii) was plated out and 
blotted onto Hybond-N nylon membranes (Amersham) according to the 

15 manufacturer's instruction. The PGR fragment was labeled to 10^ cpm//zg with ^^P- 
dGTP. hybridization was carried out at 55'*G overnight in 0,22 M NaGl, 15mM 
NaH2P04, 1.5 mM EDTA, 1% SDS, 1% BLOTTO and 4 mg/ml herring sperm 
DNA, The membranes were washed for 2x 10 min at room temperature in 2x SSG, 
1% SDS followed by 2x 10 min at 55^G in 0.2x SSG, 1% SDS. Positive XZAP 

20 clones were in vivo excised (Stratagene) and DNA sequences were analyzed. The 
clone encoding the RT25 protein backbone was designated AGPNal 1 cDNA. The 
nucleotide and deduced protein sequences were analyzed using the PG/Gene software 
(IntelliGenetics). 

(d) RNA Blot Analvsis 

25 RNA blot analysis was performed as described by Sambrook et al. (1989) 

suEia, Hybridization and washing conditions were the same as described above 
except that the AGPNal 1 cDNA was used as probe and hybridization was carried out 
at 60^G. 
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EXAMPLE 5. Cloning of an AGP gene from P. communis using an 
antisense RNA probe 

[1.] The PcAGP9 cDNA clone (SEQ ID NO:66) 

(a) Isolation and purification of AGP peptides from cell cultures of Pvrus 
5 communis (pear) 

The procedure essentially as described in Example 3(a) was followed to obtain 
amino acid sequences of AGP peptide fragments. The sequence A-K-S-O-T-A-T-O- 
O-T-A-T-O-O-S-A-V (SEQ ID NO: 37) was selected as a template for the isolation of 
a corresponding AGP gene. 

10 (b) Cloning of a pear AGP gene encoding SEP ID NO;37 

In the previous examples of the invention (Examples 2, 3, and 4) AGP genes 
were isolated by utilizing a hydroxyproline-poor sequence of an isolated AGP peptide 
fragment to synthesize an oligonucleotide primer which was not enriched in GC. In 
contrast, in this example (Example 5), a hydroxyproline-rich peptide sequence is 

15 utilized for the construction of an antisense RNA probe. 

The sequences of two oligonucleotide (AF1T3) and (AR2T7) used for the 
construction of a GC-rich probe are presented in Table 5.1. 

An antisense RNA probe was synthesized from the PGR fragment by using T7 
RNA polymerase (Promega) and used to screen a cDNA library prepared from pear 

20 cell suspension culmre. The hybridization was carried out at 40° C in hybridization 
buffer containing 2x SSPE, 1% SDS, 0.5% BLOTTO, 50% formamide and 0.5 
mg/ml denatured herring sperm DNA, After overnight hybridization, lifts were first 
rinsed at room temperature with 2x SSC, 0.1% SDS and then washed at 50**C with 
the same buffer for 30 min. The lifts were finally washed at 50°C with Ix SSC, 

25 0.1% at 50°C for another 30 min. Three cDNA clones were isolated and sequenced. 
The sequence of the longest cDNA clone PcAGP9 (SEQ ID NO:66) is shown in 
Figure 5 A. 

[2.] The PcAGP2 cDNA clone (SEQ ED NO:91) 

(a) Further purification of AGP peptides from cell cultures of Pvrus 
30 communis (pear) 

AGPs in pear cell culture filtrate were purified by precipitation with the B- 

glucosyl Yariv reagent and fractionated by HPLC as described in Example 3(a). A 
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flow chart of the purification procedure is presented in Figure 5D. The major peak 
of Figure 5D-A, which accounted for approximately 27% of the AGPs loaded onto 
the column, was collected and reapplied to the same colunrn. Upon elution with a 
shallow gradient, two peaks (Fractions 1 and 2) were resolved (Figure 5D-B). The 
5 AGPs in Fraction 1 were described in Example 3 and Example 5[1]. 

Fraction 2 (Figure 5D-B) was subjected to size-exclusion fractionation on 
superose-6 FPLC and was resolved into two components, peaks 2A and 2B (Figure 
5D-C3). N-terminal amino acid sequencing of material in Peak 2B gave the sequence 
AEAEAXTXALQWAEAXEL (SEQ ID NO:74). 

10 AGPs in Peaks 2A and 2B were separately deglycosylated and the resulting 

protein backbones were isolated by size-exclusion FPLC (Figure 5D-D1-4). Peak 2B 
gave one protein backbone with a molecular weight of 10k. Peak 2A resulted in two 
protein peaks having molecular weights of 54k and 10k. N-terminal amino acid 
sequencing of the 54k protein backbone gave the sequence TOAOA (SEQ ID NO:75), 

15 while the 10k protein backbone in Peak 2B gave the sequence 
AEAEAOTOALQWAEAOEL (SEQ ID NO:76). 

The 10k and 54k protein backbones were digested separately with thermolysin 
and the resulting peptides were purified by RP-HPLC for sequencing. Sequences of 
eight peptides were obtained from the 54k protein of Peak 2A and three from the 10k 

20 protein in Peak 2B (Table 3.6). Two of the three sequences and the N-terminal 

sequence overlap to give a sequence AEAEAOTOALQWAEAOELVOTOVOTOSY 

(SEQ ID NO:88) for the 10k protein in peak 2B. 

(b) Isolation of a cDNA encoding the 10k AGP protein backbone 

The approach to cloning of cDNA encodmg the 10k protein backbone was 

25 essentially the same as that used to clone the PcAGP9 cDNA. Five cDNA clones 
were isolated and sequenced. The consensus sequence of 1040 bp is shown in Figure 
5E. This cDNA is referred to as PcAGP2. 

EXAMPLE 6: Cloning and Expression of Genomic AGP Genes 

(a) Cloning of genomic AGP genes and identification of an AGP 
30 promoter region . 

The procedure essentially as used for the isolation of cDNA clones is used to 
obtain a genomic clone of a plant AGP. Whenever possible, AGP cDNA clones will 
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be used to screen genomic libraries. The following procedure describing the isolation 
of a genomic AGP clone from suspension-cultured cells of N, alata and N. 
plumbaginafolia represents a general procedure which can be adapted for the isolation 
of a genomic AGP gene from a desired plant cell. 

5 To isolate an AGP genomic clone, genomic DNA is isolated from suspension- 

cultured cells of N, alata and N. plumbaginafolia and partly digested with Sau3AI. 
After size selection by ultracentriftigation under a glycerol gradient, DNA fragments 
of 10-23 kb in size are ligated into vectors such as XDash (Stratagene) to form a 
genomic library. The libraries are then screened with the NaAGPl and NpAGPl 

10 cDNAs, respectively, to isolate their corresponding genomic clones. The resulting 
genomic clones are studied by Southern analysis and some clones are sequenced. The 
promoter region of the AGP gene is then identified from the DNA sequence, 
(b) Recombinant Gene Construction . 

The expression of a plant gene which exists in double-stranded DNA form 

15 involves transcription of messenger RNA (mRNA) from one strand of the DNA by 
RNA polymerase enzyme, and the subsequent processing of the mRNA primary 
transcript inside the nucleus. This processing involves a 3'-nontranslated region 
which adds polyadenylate nucleotides to the 3 '-end of the RNA. Transcription of 
DNA into mRNA is regulated by a promoter. The promoter region contains a 

20 sequence of bases that signals RNA polymerase to associate with the DNA and to 
initiate the transcription of mRNA using one of the DNA strands as a template to 
make a corresponding strand of RNA. 

A number of promoters which are active in plant cells have been described in 
the literature. These include the nopaline synthase (NOS) and octopine synthase 

25 (OCS) promoters (which are carried on tumor-inducing plasmids of Agrobacterium 

tumefaciens, the Cauliflower Mosaic Vims (CaMV) 19S and 35S promoters, the light- 
inducible promoter from the small subunit of ribulose bis-phosphate carboxylase 
(ssRUBISCO) and the mannopine synthase (MAS) promoter p/elten & Schell (1985) 
NucL Acids Res, 13:6981-6998]. All of these promoters have been used to create 

30 various types of DNA constmcts which have been expressed in plants (see, e.g., PCT 
publication WO84/02913). 

Promoters which are known or are found to cause transcription of RNA in 
plant cells can be used in the present invention. Such promoters may be obtained 
from plants or plant viruses and include, but are not limited to, the CAMV35S 
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promoter and promoters isolated from plant genes such as ssRUBISCO genes. It is 
preferred that the particular promoter selected should be capable of causing sufficient 
expression to result in the production of an effective amount of protein. 

The promoters used in the DNA constructs (i.e., chimeric plant genes) of the 
5 present invention may be modified, if desired, to affect their control characteristics. 
For example, the CaMV35S promoter may be ligated to the portion of the 
ssRUBISCO gene that represses the expression of ssRUBISCO in the absence of light, 
to create a promoter which is active in leaves but not in roots. The resulting 
chimeric promoter may be used as described herein. For purpose of this description, 

10 the phrase "CaMV35S" promoter thus includes variations of CaMV35S promoter, 
e.g., promoters derived by means of ligation with operator regions, random or 
controlled mutagenesis, etc. Furthermore, the promoters may be altered to contain 
multiple "enhancer sequences" to assist in elevating gene expression. 

The RNA produced by a DNA construct of the present invention also contains 

15 a 5'-nontranslated leader sequence. This sequence can be derived from the promoter 
selected to express the gene, and can be specifically modified so as to increase 
translation of the mRNA. The 5'-nontranslated regions can also be obtained from 
viral RNAs, from suitable eukaryotic genes, or from a synthetic gene sequence. The 
present invention is not limited to constructs as presented in the following examples. 

20 Rather, the nontranslated leader sequence can be part of the 5 '-end of the 

nontranslated region of the coding sequence for the virus coat protein, or part of the 
promoter sequence, or can be derived from an unrelated promoter or coding sequence 
in any case. It is preferred that the sequence flanking the initiation site conform to 
the translational consensus sequence rules for enhanced translation initiation reported 

25 by Kozak (1984) Nature 308:241-246. 

The DNA construct of the present invention also contains a modified or fully- 
synthetic structural coding sequence which has been changed to enhance the 
performance of the gene in plants. For example, the enhancement method can be 
applied to design modified and fully synthetic genes encoding a plant AGP protein. 

30 The structural genes of the present invention may optionally encode a fusion protein 
comprising an amino-terminal chloroplast transit peptide or secretory signal sequence, 
etc. 

The DNA construct also contains a 3 '-nontranslated region. The 3'- 
nontranslated region contains a polyadenylation signal which functions in plants to 
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cause the addition of polyadenylate nucleotides to the 3 '-end of the viral RNA. 
Examples of suitable 3'- regions are (1) the 3 '-transcribed, nontranslated regions 
containing the polyadenylation signal of Agrobacterium tumor-inducing (Ti) plasmid 
genes, such as the nopaline synthase (NOS) gene, and (2) plant genes like the soybean 
5 storage protein (7S) genes and the small subunit of the RuBP carboxylase (E9) gene. 
An example of a preferred 3* -region is that from the 7S gene. 

(c) Plant Transformation , 

A chimeric plant gene containing a structural coding sequence of the present 
invention can be inserted into the genome of a plant by any suitable method. Suitable 

10 plants for use in the practice of the present invention include, but are not limited to, 
soybean, cotton, alfalfa, oilseed rape, tlax, tomato, sugarbeet, sunflower, potato, 
tobacco, maize, rice and wheat. Suitable plant transformation vectors include those 
derived from a Ti plasmid of Agrobacterium tumefaciens, as well as those disclosed., 
e.g., by Herrera-Estrella et al. (1983) Nature 303:209, Bevan et al. (1983) Nature 

15 304:184, Klee et al. (1985) Bio /Technology 3:637-642, and EPO publication 

120,516, In addition to plant transformation vectors derived from the Ti or root- 
inducing (Ri) plasmids of Agrobacterium, alternative methods can be used to insert 
the DNA constructs of this invention into plant cells. Such methods may involve, for 
example, the use of liposomes, electroporation, chemicals that increase free DNA 

20 uptake, free DNA delivery via microprojectile bombardment, and transformation 
using vimses or pollen. 

A useful Ti plasmid cassette vector for transformation of dicotyledonous 
plants, for example, may consist of the enhanced CaMV35S promoter and the 3 '-end 
including polyadenylation signals from a soybean gene encoding the alpha-prime 

25 subunit of beta-conglycinin. A multilinker containing multiple restriction sites for the 
insertion of genes may be positioned between these two elements. 

(d) Over- and under-production of AGP<s hy tran sformed cell lines. 

It is generally acknowledged that all plant natural cell lines produce some 
AGPs, probably at the level of approximately 2-10% (w/w) of total stmctural 
30 complex carbohydrate [Showalter (1993) Plant Cell 5:9-23] These natural plant cells 
comprise all the regulatory factors (promoters, enhancers, enzymes, etc.) for 
transcription, translation and post-translational processing to produce a glycosylated 
AGP as the natural product. Glycosylation comprises the steps of (a) proline 
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hydro xylation with a prolyl hydroxylase, (b) galactosylation using a unique 6-Hyp- 
galactosyl transferase, (c) the addition of galactose chains by a separate galactosyl 
transferase for each linkage type, and (d) the addition of arabinose by arabinosyl 
transferase. Thus, cultured natural plant cells (e.g., monocots or dicots) can be 

5 transformed with heterologous recombinant gene fragments and used for 

overproduction or imderproduction of nonglycosylated AGPs. In some cases, a dicot 
host may be transformed with a monocot gene or, alternatively a monocot host may 
be transformed with a dicot gene. Alternatively, a host cell which normally does not 
produce glycosylated AGP (e.g., E. coli) may be transformed and used for the over- 

10 or under-production of a nonglycosylated AGP peptide backbone in which the proline 
residues have not been hydroxylated. 

To transform a host cell for overproduction of AGP, an AGP cDNA (e.g., 
NaAGPl or NpAGPl) is linked at the 5'-end with a heterologous promoter (e.g., 
CaMV 35S promoter) and at the 3 '-end with a terminator (e.g., NOS-terminator). 

15 Thus, the AGP gene will be under the control of the CaMV 35S promoter, which is 
known to be a strong promoter. This expression cassette is then subcloned into a 
binary vector derived from the A. tumefaciens Ti plasmid to transform the cultured 
cells of either N. alata or N, plumbaginafolia to create cell lines that overproduce 
AGPs. The AGP is also tagged by histidines at the C-terminus by introducing a six- 

20 histidine coding DNA fragment into the AGP cDNAs. The six-histidine tagged AGP 
can then be readily isolated by using nickel-nitrolotriacetic acid Sepharose colunm 
(Hochuli et al., 1988, Bio/Technology 6:1321-1325). An alternative approach is to 
use the tag, Flag™, [Hopp, T.P. et al. (1988) Biotechnology 6:1204-1210], which can 
be incorporated into the AGP sequence to allow purification with an anti-Flag™ 

25 monoclonal antibody. 

To transform a host cell for underproduction of AGP, an antisense construct is 
utilized. In this construct, the AGP cDNA is situated in the opposite direction of the 
CaMV 35S promoter so that an antisense transcript is produced. This transcript 
hybridizes to its corresponding sense mRNA eventually leading to the inhibition of 

30 gene expression. 
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CLAIMS: 

1. A cloned DNA molecule encoding a protein backbone of a plant arabinogalactan 
protein (AGP). 

2. A cloned DNA molecule encoding a protein backbone of an arabinogalactan 
protein from the plant family Solanaceae or Rosaceae. 

3. The cloned DNA molecule of claim 2 wherein said arabinogalactan protein is 
from Nicotiana or Pyrus, 

4. The cloned DNA molecule of claim 3 wherein said arabinogalactan protein is 
from Nicotiana alata or Nicotiana plumbaginafolia. 

5. The cloned DNA molecule of claim 3 wherein said arabinogalactan protein is 
from Pyrus communis. 

6. The cloned DNA molecule of claim 3 wherein said arabinogalactan protein is 
from style of Nicotiana. 

7. The cloned DNA molecule of claim 2 wherein said cloned DNA molecule is 
from Solanaceae and hybridizes to a nucleotide sequence encoding an amino acid 
sequence selected from the group consisting essentially of amino acid sequences 
SEQ ID NOStll and 26-30 or to a nucleotide sequence selected from the group 
consisting essentially of nucleotide sequences SEQ ID NOS:I3, 14 and 21-25. 

8. The cloned DNA molecule of claim 2 wherein said cloned DNA molecule 
consists essentially of the nucleotide sequence SEQ ID NO:24 or SEQ ID 
NO:25. 

9. The cloned DNA molecule of claim 7 wherein said cloned DNA molecule is a 
genomic AGP gene. 

10. The cloned DNA molecule of claim 6 wherein said cloned DNA molecule 
hybridizes to a nucleotide sequence encoding an amino acid sequence selected 
from the group consisting essentially of amino acid sequences SEQ ID NOS:50- 
60 or a nucleotide sequence selected from the group consisting essentially of 
nucleotide sequences SEQ ID NOS:61-63. 

11. The cloned DNA molecule of claim 6 wherein said cloned DNA molecule 
hybridizes to an RNA sequence transcribed from a DNA sequence encoding an 

. amino acid sequence selected from the group consisting essentially of amino acid 
sequences SEQ ID NOS: 50-60 and 67-70 or from a DNA sequence selected from 
the group consisting essentially of SEQ ID NOS: 7 1-72. 
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12. The cloned DNA molecule of claim 6 wherein said cloned DNA molecule 
consists essentially of nucleotide sequence SEQ ID NO: 63 or SEQ ID NO: 72. 

13. The cloned DNA molecule of claim 10 or 11 wherein said DNA molecule is a 
genomic AGP gene. 

14- The cloned DNA molecule of claim 2 wherein said cloned DNA molecule is 
from Pyrus and hybridizes to a nucleotide sequence encoding an amino acid 
sequence selected from the group consisting essentially of amino acid sequences 
SEQ ID NOS:31-44 or to a nucleotide sequence selected from the group 
consisting essentially of nucleotide sequences SEQ ID NOS:45-49. 

15. The cloned DNA molecule of claim 2 wherein said cloned DNA molecule is 
from Pyrus and hybridizes to an RNA sequence transcribed from a DNA 
sequence encoding an amino acid sequence selected from the group consisting 
essentially of amino acid sequences SEQ ID NOS:31-44 and 73-88 or from a 
DNA sequence selected from the group consisting essentially of SEQ ID 
NOS:64-66 and 89-91. 

16. The cloned DNA molecule of claim 2 wherein said cloned DNA molecule 
consists essentially of a nucleotide sequence selected from the group consisting 
essentially of SEQ ID NOS:49, 66 and 91. 

17. The cloned DNA molecule of claim 14 or claim 15 wherein said DNA molecule 
is a genomic AGP gene. 

18. A DNA recombinant vector comprising a cloned DNA molecule of claim 1 or 
claim 2. 

19. A host cell transformed with a cloned DNA molecule of claim 18 so that a 
glycosylated or nonglycosylated arabinogalactan protein is expressed. 

20. The host cell of claim 19 wherein said host cell is a bacterial or plant cell, 

21. A genetically-engineered DNA molecule comprising a plant arabinogalactan 
protein gene of claim 1 or 2 under control of a heterologous promoter so that a 
glycosylated or nonglycosylated arabinogalactan protein is expressed. 

22. The genetically-engineered DNA molecule of claim 21 wherein said 
arabinogalactan protein gene comprises a nucleotide sequence selected from a 
group consisting essentially of SEQ ID NOS:24, 25, 49, 63, 66, 72 and 91. 

23. A genetically engineered DNA molecule comprising a plant AGP promoter 
simated adjacent to a heterologous structural gene such that said structural gene 
is expressed under control of said plant AGP promoter. 
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24. A substantially pure plant arabinogaiactan protein. 

25 . The plant arabinogaiactan protein of claim 24 wherein said plant is from the 
family Solanaceae or Rosaceae. 

26. The substantially pure plant arabinogaiactan protein of claim 24 consisting 
essentially of the amino acid sequence derived from a nucleotide sequence 
selected from a group consisting essentially of SEQ ID NOS:24, 25, 49, 63, 66, 
72 and 91. 

27. An antibody to a substantially pure plant arabinogaiactan protein of claim 24. 

28. A method of obtaining a plant arabinogaiactan gene comprising the step of 
utilizing an amino acid sequence from an isolated AGP peptide, or molecule 
Aereof , to design a nucleotide sequence useful in screening a plant gene library 
for a hybridizing clone. 

29. A method of obtaining a plant arabinogaiactan gene comprising the step of 
utilizing a hydroxyproline-rich amino acid sequence from an isolated AGP 
peptide, or molecule thereof, to design an RNA probe useful in screening a plant 
gene library for a hybridizing clone, wherein said hydroxyproline-rich amino 
acid sequence is enriched in hydroxyproline, alanine, serine, and threonine 
(OAST) content, and wherein said RNA probe comprises a nucleotide sequence 
containing a coding sequence of said hydroxyproline-rich amino acid sequence. 

30. A chemical or pharmacological reagent comprising a plant arabinogaiactan 
protein produced from a cloned DNA molecule encoding a protein backbone of 
said plant AGP wherein said reagent is useful as an agent selected from the 
group consisting of an emulsifying agent, emulsion stabilizer, a thickening agent, 
a gelling agent, a texture modifier, a sizing agent, a binding agent, a coating 
agent, an adhesive agent, a dispersing agent, an encapsulating agent, a 
suspending agent, a lubricating agent, a coagulating agent and a combination 
thereof. 
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AMENDED CLAIMS 

[received by the International Bureau on 22 May 1995 (22.05.95); 
original claims 1 and 24; remaining 
claims unchanged (2 pages)] 

I . (Amended) A cloned DNA molecule encoding a protein backbone of a plant 
arabinogalactan protein (AGP) that is not an inducible nodulin protein. 

2 A cloned DNA molecule encoding a protein backbone of an arabinogalactan 
protein from the plant family Solanaceae or Rosaceae. 

3. The cloned DNA molecule of claim 2 wherein said arabinogalactan protein is 
from Nicotiana or Pyrus, 

4. The cloned DNA molecule of claim 3 wherein said arabinogalactan protein is 
from Nicotiana alata or Nicotiana plumbaginafolia, 

5. The cloned DNA molecule of claim 3 wherein said arabinogalactan protein is 
from Pyrus communis, 

6. The cloned DNA molecule of claim 3 wherein said arabinogalactan protein is 
from style of Nicotiana, 

7. The cloned DNA molecule of claim 2 wherein said cloned DNA molecule is 
from Solanaceae and hybridizes to a nucleotide sequence encoding an amino acid 
sequence selected from the group consisting essentially of amino acid sequences 
SEQ ID NOS:ll and 26-30 or to a nucleotide sequence selected from the group 
consisting essentially of nucleotide sequences SEQ ID NOS:13, 14 and 21-25. 

8. The cloned DNA molecule of claim 2 wherein said cloned DNA molecule 
consists essentially of the nucleotide sequence SEQ ID NO:24 or SEQ ID 
NO:25. 

9. The cloned DNA molecule of claim 7 wherein said cloned DNA molecule is a 
genomic AGP gene. 

10. The cloned DNA molecule of claim 6 wherein said cloned DNA molecule 
hybridizes to a nucleotide sequence encoding an amino acid sequence selected 
from the group consisting essentially of amino acid sequences SEQ ID NOS:50- 
60 or a nucleotide sequence selected from the group consisting essentially of 
nucleotide sequences SEQ ID NOS:61-63. 

II. The cloned DNA molecule of claim 6 wherein said cloned DNA molecule 
hybridizes to an RNA sequence transcribed from a DNA sequence encoding an 
amino acid sequence selected from the group consisting essentially of amino acid 
sequences SEQ ID NOS:50-60 and 67-70 or from a DNA sequence selected from 
the group consisting essentially of SEQ ID NOS:71-72. 
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24. (Amended) A substantially pure plant arabinogalactan protein that is not an 
inducible nodulin protein. 

25. The plant arabinogalactan protein of claim 24 wherein said plant is from the 
family Solanaceae or Rosaceae. 

26. The substantially pure plant arabinogalactan protein of claim 24 consisting 
essentially of the amino acid sequence derived from a nucleotide sequence 
selected from a group consisting essentially of SEQ ID NOS:24, 25, 49, 63, 66, 
72 and 91. 

27. An antibody to a substantially pure plant arabinogalactan protein of claim 24. 

28. A method of obtaining a plant arabinogalactan gene comprising the step of 
utilizing an amino acid sequence from an isolated AGP peptide, or molecule 

.thereof, to design a nucleotide sequence useful in screening a plant gene library 
for a hybridizing clone. 

29. A method of obtaining a plant arabinogalactan gene comprising the step of 
utilizing a hydroxyproline-rich amino acid sequence from an isolated AGP 
peptide, or molecule thereof, to design an RNA probe useful in screening a plant 
gene library for a hybridizing clone, wherein said hydroxyproline-rich anuno 
acid sequence is enriched in hydroxyproline, alanine, serine, and threonine 
(OAST) content, and wherein said RNA probe comprises a nucleotide sequence 
containing a coding sequence of said hydroxyproline-rich amino acid sequence. 

30. A chemical or phannacological reagent comprising a plant arabinogalactan 
protein produced from a cloned DNA molecule encoding a protein backbone of 
said plant AGP wherein said reagent is useful as an agent selected from the 
group consisting of an emulsifying agent, emulsion stabilizer, a thickening agent, 
a gelling agent, a texture modifier, a sizing agent, a binding agent, a coating 
agent, an adhesive agent, a dispersing agent, an encapsulating agent, a 
suspending agent, a lubricating agent, a coagulating agent and a combination 
thereof. 
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